SOLVED
I'm working on replication of the code that is used at the Reproducible Finance with R. A link to the webinar is here: https://www.rstudio.com/resources/webinars/reproducible-finance-with-r/
The data for the exercise was downloaded from Yahoo Finance. The .csv file is here: http://www.reproduciblefinance.com/data/data-download/
Following the instructions provided at the webinar, it happened to me that the code doesn't work as it is in the lesson:
portfolio_returns_tidyquant_rebalanced_monthly %>%
mutate(
dplyr_port_returns = portfolio_returns_dplyr_byhand$returns,
xts_port_returns = coredata(portfolio_returns_xts_rebalanced_monthly)
)%>%
head()
The system doesn't provide any output, nor it shows if there is a mistake in the code.
I then decided to eliminate each new variable I want to create to see if something happens. It turned out that if one variable is not included in the mutate() command the system produces a partial output I need. Below are a code and an output.
portfolio_returns_tidyquant_rebalanced_monthly %>%
mutate(
dplyr_port_returns = portfolio_returns_dplyr_byhand$returns,
# xts_port_returns = coredata(portfolio_returns_xts_rebalanced_monthly)
)%>%
head(2)
Date
returns
dplyr_port_returns
2013-01-31
0.0308487341
0.0308487341
2013-02-28
-0.0008697461
-0.0008697461
In addition, some information about variables:
class(portfolio_returns_xts_rebalanced_monthly)
[1] "xts" "zoo"
class(portfolio_returns_dplyr_byhand)
[1] "tbl_df" "tbl" "data.frame"
The portfolio_returns_xts_rebalanced_monthly was created using the following code:
symbols <- c("SPY", "EFA", "IJS", "EEM", "AGG")
prices <-
getSymbols(
symbols,
src = 'yahoo',
from = "2012-12-31",
to = "2017-12-31",
auto.assign = T,
warnings = F
) %>%
map(~Ad(get(.))) %>%
reduce(merge) %>%
`colnames<-`(symbols)
w <- c(
0.25,
0.25,
0.20,
0.20,
0.10
)
prices_monthly <-
to.monthly(
prices,
indexAt = "lastof",
OHLC = FALSE
)
assets_return_xts <- na.omit(
Return.calculate(
prices_monthly,
method = "log"
)
)
portfolio_returns_xts_rebalanced_monthly <-
Return.portfolio(
assets_return_xts,
weights = w,
rebalance_on = 'months'
) %>%
`colnames<-`("returns")
I'm pretty sure this is somehow connected to a mutate() function and classes of variables, but I couldn't find any information on the matter. Your support is highly appreciated.
UPDATE.
Changing the class of one object from xts to data.frame, and adjusting a code a bit solved the issue.
An updated code:
portfolio_returns_xts_rebalanced_monthly_df <-
data.frame(
date=index(portfolio_returns_xts_rebalanced_monthly),
coredata(portfolio_returns_xts_rebalanced_monthly)
)
portfolio_returns_tidyquant_rebalanced_monthly %>%
mutate(
dplyr_port_returns = portfolio_returns_dplyr_byhand$returns,
xts_port_returns = portfolio_returns_xts_rebalanced_monthly_df$returns
)%>%
head()
Related
I tried to run a panel var on dataset I got from Statistics Sweden and here is what I get:
df<- read_excel("Inkfördelning per kommun.xlsx")
nujavlar <- pvarfeols(dependent_vars = c("Kvintil-1", "Kvintil-4", "Kvintil-5"),
lags = 1,
transformation = "demean",
data = df,
panel_identifier = c("Kommun", "Year")
)
Error: Can't subset columns that don't exist.
x Column `Kvintil-1` doesn't exist.
I often get this message too:
Warning in xtfrm.data.frame(x) : cannot xtfrm data frames
Error: Can't subset columns that don't exist.
x Location 2 doesn't exist.
ℹ There are only 1 column.
I have made sure that all data is numeric. I have also tried cleaning my workspace and restarted the programme. I also tried to convert it into a paneldata frame with palm package. I also tried converting my entity variable "Kommun" (Municipality) into factors and it still doesn't work.
Here's the data if someone wants to give it a go.
https://docs.google.com/spreadsheets/d/16Ak_Z2n6my-5wEw69G29_NLryQKcrYZC/edit?usp=sharing&ouid=113164216369677216623&rtpof=true&sd=true
The column names in your dataframe are Kvintil 1, not Kvintil-1, so the variable you are referring to really does not exist. Please be aware that in R, variable names cannot have hyphens and it is good practice to avoid spaces in variable names because it is annoying to refer to variables with spaces. I have included a reproducible example below.
library(tidyverse)
library(gsheet)
library(panelvar)
url <- 'docs.google.com/spreadsheets/d/16Ak_Z2n6my-5wEw69G29_NLryQKcrYZC'
df <- gsheet2tbl(url) %>%
rename(Kvintil1 = `Kvintil 1`) %>%
rename(Kvintil2 = `Kvintil 2`) %>%
rename(Kvintil3 = `Kvintil 3`) %>%
rename(Kvintil4 = `Kvintil 4`) %>%
rename(Kvintil5 = `Kvintil 5`) %>%
as.data.frame()
nujavlar <- pvarfeols(
dependent_vars = c("Kvintil1", "Kvintil4", "Kvintil5"),
lags = 1,
transformation = "demean",
data = df,
panel_identifier = c("Kommun", "Year"))
I am trying to replicate this visual, but with my own data. This is the template I am working off of - https://r-graph-gallery.com/183-choropleth-map-with-leaflet.html
My intent is to highlight every country with a value in the same color. I might make it a heatmap or something - but right now adding the polygons gives an error so I cannot try any color options at all.
# Setup
library(leaflet)
library(rgdal)
library(here)
library(tidyverse)
# Basically copy pasted from the template, but the download did not work. I manually went to the website, downloaded the file, manually un-zipped, and manually dropped it in my working directory
# download.file("http://thematicmapping.org/downloads/TM_WORLD_BORDERS_SIMPL-0.3.zip" , destfile="DATA/world_shape_file.zip")
# system("unzip DATA/world_shape_file.zip")
world_spdf <- readOGR(
dsn= here() ,
layer="TM_WORLD_BORDERS_SIMPL-0.3",
verbose=FALSE
)
world_spdf#data$POP2005[ which(world_spdf#data$POP2005 == 0)] = NA
world_spdf#data$POP2005 <- as.numeric(as.character(world_spdf#data$POP2005)) / 1000000 %>% round(2)
# Example of my data - I have countries and numbers associated with them, although not every country has a number
country <- c("Algeria", "Argentina", "Australia")
values <- c(1,4,4)
my_df <- dataframe(country, values)
# This is how I am trying to add MY values to the map. I have to convert the map to a tibble, add my data, then convert it back to a map. Perhaps this is the problem?
interactive_data_attempt <- world_spdf %>%
as.tibble() %>%
left_join(my_df , by = c("NAME" = "country")) %>%
mutate(texts = replace_na(texts, 0),
exists = texts > 1) %>%
st_as_sf(coords = c("LON","LAT"))
# This is the method I used to do the exact same thing in a domestic US map
bins <- c(seq(0,1,1), Inf)
pal <- colorBin(c("white","#C14A36"), domain = interactive_data_attempt$exists, bins = bins, reverse = FALSE)
# This gives an error: Error in to_ring.default(x) : Don't know how to get polygon data from object of class XY,POINT,sfg
leaflet(interactive_data_attempt) %>%
addTiles() %>%
setView(lat=10, lng=0 , zoom=2) %>%
addPolygons(fillColor = ~pal(interactive_data_attempt$exists))
You use readOGR to get an sp object, but at one point you convert it to tibble and then to sf? Not sure about sp, but in most cases you can handle sf as a regular tibble / dataframe, i.e. left_jointo it. And you can read shapefile directly to sf with st_read.
Then there's something with your variables, a mixup from copy-paste I would guess: in my_df you have values but you never do anything with it and in your mutate you use texts but it's unclear where it's coming from.
Binary palette is built from exists, a boolean value that should indicate if the actual value is present or not, though I'd assume you'd want to use values from your my_df$values instead.
Left NA values as-is, changed bins (to just 2) and adjusted some colours.
library(leaflet)
library(sf)
library(dplyr)
library(tidyr)
# download.file("http://thematicmapping.org/downloads/TM_WORLD_BORDERS_SIMPL-0.3.zip" , destfile="world_shape_file.zip")
# unzip("world_shape_file.zip",exdir = "world_shape_file")
world_sf <- st_read("world_shape_file")
world_sf$POP2005[ which(world_sf$POP2005 == 0)] = NA
world_sf$POP2005 <- as.numeric(as.character(world_sf$POP2005)) / 1000000 %>% round(2)
country <- c("Algeria", "Argentina", "Australia")
values <- c(1,4,4)
pal <- colorBin(c("blue","#C14A36"), domain = values, bins = 2, reverse = FALSE, na.color = "transparent")
world_sf %>%
left_join(
tibble(country, values),
by = c("NAME" = "country")) %>%
leaflet() %>%
addTiles() %>%
setView(lat=10, lng=0 , zoom=2) %>%
addPolygons(fillColor = ~pal(values), stroke = FALSE)
Created on 2022-11-12 with reprex v2.0.2
I am using the R programming language. I am trying to follow the R tutorial over here on neural networks (lstm) and time series: https://blogs.rstudio.com/ai/posts/2018-06-25-sunspots-lstm/
I decided to create my own time series data ("y.mon") for this tutorial (the same format and the same variable names) :
library(tidyverse)
library(glue)
library(forcats)
library(timetk)
library(tidyquant)
library(tibbletime)
library(cowplot)
library(recipes)
library(rsample)
library(yardstick)
library(keras)
library(tfruns)
library(dplyr)
library(lubridate)
library(tibbletime)
library(timetk)
index = seq(as.Date("1749/1/1"), as.Date("2016/1/1"),by="day")
index <- format(as.Date(index), "%Y/%m/%d")
value <- rnorm(97520,27,2.1)
final_data <- data.frame(index, value)
y.mon<-aggregate(value~format(as.Date(index),
format="%Y/%m"),data=final_data, FUN=sum)
y.mon$index = y.mon$`format(as.Date(index), format = "%Y/%m")`
y.mon$`format(as.Date(index), format = "%Y/%m")` = NULL
y.mon %>%
mutate(index = paste0(index, '/01')) %>%
tk_tbl() %>%
mutate(index = as_date(index)) %>%
as_tbl_time(index = index) -> y.mon
From here on, I follow the instructions in the tutorial (replacing the "sun_spots data" with "y.mon". Everything works fine until this point (I posted a question yesterday that got closed for being too detailed https://stackoverflow.com/questions/65527230/r-error-in-is-symbolx-object-not-found-keras - the code can be followed from the rstudio tutorial) :
#ERROR
coln <- colnames(compare_train)[4:ncol(compare_train)]
cols <- map(coln, quo(sym(.)))
rsme_train <-
map_dbl(cols, function(col)
rmse(
compare_train,
truth = value,
estimate = !!col,
na.rm = TRUE
)) %>% mean()
rsme_train
Error in is_symbol(x) : object '.' not found
I found another stackoverflow post which deals with a similar problem:Getting error message while calculating rmse in a time series analysis
According to this stackoverflow post, this first error can be resolved like this:
coln <- colnames(compare_train)[4:ncol(compare_train)]
rsme_train <-
map_df(coln, function(col)
rmse(
compare_train,
truth = value,
estimate = !!col,
na.rm = TRUE
)) %>%
pull(.estimate) %>%
mean()
rsme_train
However, the following section of the tutorial has a similar section in which the same error persists even after applying the corrections:
compare_test %>% write_csv(str_replace(model_path, ".hdf5", ".test.csv"))
compare_test[FLAGS$n_timesteps:(FLAGS$n_timesteps + 10), c(2, 4:8)] %>% print()
cols <- map(coln, quo(sym(.)))
rsme_test <-
map_dbl(cols, function(col)
rmse(
compare_test,
truth = value,
estimate = !!col,
na.rm = TRUE
)) %>% mean()
rsme_test
#errors:
Error in stri_replace_first_regex(string, pattern, fix_replacement(replacement), :
object 'model_path' not found
Error in is_symbol(x) : object '.' not found
These errors are preventing me from finishing the rest of the tutorial.
Can someone please show me how to fix these?
Thanks
Try using coln in map_dbl :
rsme_test <- map_dbl(coln, function(col)
rmse(
compare_test,
truth = value,
estimate = !!col,
na.rm = TRUE
)) %>% mean()
I desperately need help!
I am trying to predict drug use based on 5 characteristics: Age, Gender, Education, Ethnicity, Country. I already build a tree model in R with rpart
DrugTree3 <- rpart(formula = DrugUser ~ Age+Gender+Education+Ethnicity+Country, data = traindata)
, a logistic regression model
DrugLog <- glm(formula = DrugUser ~ Age+Gender+Ethnicity+Education+Country,data = traindata, family = binomial)
, and a knn model
KnnModel <- train(form = DrugUser~., data = ModelData,method ='knn',tuneGrid=expand.grid(.k=1:100),metric='Accuracy',trControl=trainControl(method='repeatedcv',number=10,repeats=10)) .
I saved those as RDS files and uploaded them successfully in Power BI.
I then created tables for each characterization and created okviz filters for them.
Then I tried to predict whether a customer gets predicted as a drug user or a non-drug user based on the selections in the okviz filters. This is when everything went horribly wrong:
I created a custom R visual vor each model prediction and inserted the following code in each visual:
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:
# dataset <- data.frame(chunk_id, model_id, model_str, AgeLabel, GenderLabel, CountryLabel, EducationLabel, EthnicityLabel)
# dataset <- unique(dataset)
# Paste or type your script code here:
library(dplyr)
from_byte_string = function(x) {
xcharvec = strsplit(x, " ")[[1]]
xhex = as.hexmode(xcharvec)
xraw = as.raw(xhex)
unserialize(xraw)
}
# R Visual imports tables with read.csv but no argument for strings_as_factors = F.
# This means some of the chunks are truncated (ie if they had a " " at the end).
# If you convert to a character and add a space if nchar == 9999 the deserialization works.
# (Thanks to Danny Shah)
dataset <- dataset %>%
mutate( model_str = as.character(model_str) ) %>%
mutate( model_str = ifelse(nchar(model_str) == 9999, paste0(model_str, " "), model_str) )
model_vct <- dataset %>%
filter(model_id == 1) %>%
distinct(model_id, chunk_id, model_str) %>%
arrange(model_id, chunk_id) %>%
pull(model_str)
finalfit.str <- paste( model_vct, collapse = "" )
finalfit <- from_byte_string(finalfit.str)
# get the user parameters
userdata <- dataset %>% select(AgeLabel,GenderLabel,CountryLabel,EducationLabel,EthnicityLabel) %>% unique()
# and then using them to make a prediction
myprediction <- predict(finalfit,newdata=data.frame(Age=userdata$AgeLabel,Gender=userdata$GenderLabel,Country=userdata$CountryLabel, Education=userdata$EducationLabel,Ethnicity=userdata$EthnicityLabel))
maxpred <- which(myprediction==max(myprediction))
myclass <- maxpred - 1
myprob <- myprediction[[maxpred]]
plot.new()
text(0.5,0.5,labels=sprintf("P(class = %s) = %s",myclass,as.character(round(myprob,2))),cex=3.5)
Error: Can't determine relationship between fields.
What has gone wrong here?
When I then clicked on the diagonal arrow to get to R Studio, this happens: Unable to construct R script data for use in external R IDE.
I need help as I am literally going crazy over this and I don't know how to resolve the issue! I would be really happy if you can help me
enter image description here
You made a error in line 34, and line 25.
Below is a fixed version of your code.
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:
# dataset <- data.frame(chunk_id, model_id, model_str, AgeLabel, GenderLabel, CountryLabel, EducationLabel, EthnicityLabel)
# dataset <- unique(dataset)
# Paste or type your script code here:
library(dplyr)
from_byte_string = function(x) {
xcharvec = strsplit(x, " ")[[1]]
xhex = as.hexmode(xcharvec)
xraw = as.raw(xhex)
unserialize(xraw)
}
# R Visual imports tables with read.csv but no argument for strings_as_factors = F.
# This means some of the chunks are truncated (ie if they had a " " at the end).
# If you convert to a character and add a space if nchar == 9999 the deserialization works.
# (Thanks to Danny Shah)
dataset <- dataset %>%
mutate( model_str = as.character(model_str) ) %>%
mutate( model_str = ifelse(nchar(model_str) == 9999, paste0(model_str, " "), model_str) )
model_vct <- dataset %>%
filter(model_id == 1) %>%
distinct(model_id, chunk_id, model_str) %>%
arrange(model_id, chunk_id) %>%
pull(model_str)
finalfit.str <- paste( model_vct, collapse = "" )
finalfit <- from_byte_string(finalfit.str)
# get the user parameters
userdata <- dataset %>% select(AgeLabel,GenderLabel,CountryLabel,EducationLabel,EthnicityLabel) %>% unique()
# and then using them to make a prediction
myprediction <- predict(finalfit,newdata=data.frame(Age=userdata$AgeLabel,Gender=userdata$GenderLabel,Country=userdata$CountryLabel, Education=userdata$EducationLabel,Ethnicity=userdata$EthnicityLabel))
maxpred <- which(myprediction==max(myprediction))
myclass <- maxpred - 1
myprob <- myprediction[[maxpred]]
plot.new()
text(0.5,0.5,labels=sprintf("P(class =
Good Luck!
I need to run a script for each station (I was replacing the numbers 1 by 1 in the script) but there're more than 100 stations.
I thought maybe loop in script could save my time. Never done loop before, don't know if it's possible to do what I want. I've tried as the bellow but doesn't work.
Just a bit of my df8 data (txt):
RowNum,date,code,gauging_station,precp
1,01/01/2008 01:00,1586,315,0.4
2,01/01/2008 01:00,10990,16589,0.2
3,01/01/2008 01:00,17221,30523,0.6
4,01/01/2008 01:00,34592,17344,0
5,01/01/2008 01:00,38131,373,0
6,01/01/2008 01:00,44287,370,0
7,01/01/2008 01:00,53903,17314,0.4
8,01/01/2008 01:00,56005,16596,0
9,01/01/2008 01:00,56349,342,0
10,01/01/2008 01:00,57294,346,0
11,01/01/2008 01:00,64423,533,0
12,01/01/2008 01:00,75266,513,0
13,01/01/2008 01:00,96514,19187,0
Code:
station <- sample(50:150,53,replace=F)
for(i in station)
{
df08_1 <- filter(df08, V7==station [i])
colnames(df08_1) <- c("Date","gauging_station", "code", "precp")
df08_1 <- unique(df08_1)
final <- df08_1 %>%
group_by(Date=floor_date(Date, "1 hour"), gauging_station, code) %>%
summarize(precp=sum(precp))
write.csv(final,file="../station [i].csv", row.names = FALSE)
}
If you're not averse to using some tidyverse packages, I think you could simplify this a bit:
Updated with your new sample data - this runs ok on my computer:
Code:
library(dplyr)
dat %>%
select(-RowNum) %>%
distinct() %>%
group_by(date_hour = lubridate::floor_date(date, 'hour'), gauging_station, code) %>%
summarize(precp = sum(precp)) %>%
split(.$gauging_station) %>%
purrr::map(~write.csv(.x,
file = paste0('../',.x$gauging_station, '.csv'),
row.names = FALSE))
Data:
dat <- data.table::fread("RowNum,date,code,gauging_station,precp
1,01/01/2008 01:00,1586,315,0.4
2,01/01/2008 01:00,10990,16589,0.2
3,01/01/2008 01:00,17221,30523,0.6
4,01/01/2008 01:00,34592,17344,0
5,01/01/2008 01:00,38131,373,0
6,01/01/2008 01:00,44287,370,0
7,01/01/2008 01:00,53903,17314,0.4
8,01/01/2008 01:00,56005,16596,0
9,01/01/2008 01:00,56349,342,0
10,01/01/2008 01:00,57294,346,0
11,01/01/2008 01:00,64423,533,0
12,01/01/2008 01:00,75266,513,0
13,01/01/2008 01:00,96514,19187,0") %>%
mutate(date = as.POSIXct(date, format = '%m/%d/%Y %H:%M'))
Can't comment for a lack of reputation, but if the code works if you change station [i] for the number of the station, it sounds like each station is a part of and has to be extracted from the df08 object (dataframe).
If I understand you correctly, I would do this as follows:
stations <- c(1:100) #put your station IDs into a vector
for(i in stations) { #run the script for each entry in the list
#assuming that 'V7' is the name of the (unnamed) seventh column of df08, it could
#work like this:
df08_1 <- filter(df08, df08$V7==i) #if your station names are something like
#'station 1' as a string, use paste("station", 1, sep = "")
colnames(df08_1) <- c("Date","gauging_station", "code", "precp")
df08_1 <- unique(df08_1)
final <- df08_1 %>%
group_by(Date=floor_date(Date, "1 hour"), gauging_station, code) %>%
summarize(precp=sum(precp)) #floor_date here is probably your own function
write.csv(final,file=paste("../station", i, ".csv", sep=""), row.names = FALSE)
#automatically generate names. You can modify the string to whatever you want ofc.
}
If this and all of the other examples don't work, could you provide us with some dummy data to work with, just to see what the df08 dataframe looks like? And also what the floor_date() function does?