I'm sure this is a simple question, but relatively new here. I'm trying to extract the forecasted values in a CSV/table I can use outside of R. I followed along with the multiple series example from here: https://www.mitchelloharawild.com/blog/fable/ . I'm trying to extract the 2 years forecasted data that's completed in this step:
fit %>%
forecast(h = "2 years") %>%
autoplot(tourism_state, level = NULL)
I can see the 3 models in the autoplot, but can't figure out how to get the forecasted values from the Fit tsibble. Any help is appreciated. It looks like there's quite a bit of information that can be genreated (forecast intervals, etc.), so if there's somewhere I can reference on how to parse through what all can be downloaded and how please let me know. Thanks!
The forecasted values of a fable can be saved to a csv using readr::write_csv().
When used with columns that are not in a flat format (such as forecast distributions or intervals), the values will be stored as character strings and information will be lost. Before writing to a file, you should flatten these structures by extracting their components into separate columns.
You can use unpack_hilo() to extract the lower, upper, and level values within a <hilo> to create a flat data structure. Alternatively you can access the components of a <hilo> with $, for example: my_interval$lower.
Related
I think I know what I need to do, I just don't know how to make it work.
Example of Data:
data
I have decades of data in Excel in that format, which I uploaded to R. I believe I need to convert it to a time series or date format somehow, but retain the countries as categories so I can run the following regressions:
y ~ x1+x2
x1 ~ x2
y ~ x1
Can anyone share code/packages that can help me accomplish this? It feels simple, but I could not find any examples in a few hours of searching. Would ggplot also be recommended for producing figures with this data?
I tried converting it to as.xts, but that did not work, likely because of my poor understanding and the Country column. My failed attempt below:
modelts=as.xts(model1[,-1],order.by=as.Date(model1[,1],format='%m%d%Y'))
Your data is a great fit for a tsibble:
as_tsibble(your_df, key = "Country", index = "Year")
You can then use the wonderful tidyverts tools:
Tidy tools for time series
These use ggplot2 and dplyr.
A great guide for these tools is:
Forecasting: Principles and Practice (3rd ed)
I'm trying to use DESeq2's PCAPlot function in a meta-analysis of data.
Most of the files I have received are raw counts pre-normalization. I'm then running DESeq2 to normalize them, then running PCAPlot.
One of the files I received does not have raw counts or even the FASTQ files, just the data that has already been normalized by DESeq2.
How could I go about importing this data (non-integers) as a DESeqDataSet object after it has already been normalized?
Consensus in vignettes and other comments seems to be that objects can only be constructed from matrices of integers.
I was mostly concerned with getting the format the same between plots. Ultimately, I just used a workaround to get the plots looking the same via ggfortify.
If anyone is curious, I just ended up doing this. Note, the "names" file is just organized like the meta file for colData for building a DESeq object from DESeqDataSetFrom Matrix, but I changed the name of the design column from "conditions" to "group" so it would match the output of PCAplot. Should look identical.
library(ggfortify)
data<-read.csv('COUNTS.csv',sep = ",", header = TRUE, row.names = 1)
names<-read.csv("NAMES.csv")
PCA<-prcomp(t(data))
autoplot(PCA, data = names, colour = "group", size=3)
There are R native datasets, such as the Nile dataset, that are time series. However, if I actually look at the data set, be it as it was, after as_tibble(), after as.data.frame() – it doesn't matter –, there is only one column: x (which, in this specific case, is the "measurement of anual flow of the river"). However, if I plot() the data, in any of the three formats (raw, tibble or data.frame), I plots with the dates:
(Technically, the x axis label changes, but that's not the point).
Where are these dates stored? How can I access them (to use ggplot(), for example), or even – how can I see them?
If you use str(Nile) or print(Nile), you'll see that the Nile data set is store in a Time-Series object. You can use the start(), end() and frequency() functions to extract those attribute then create a new column to store those informations.
data(Nile)
new_df = data.frame(Nile)
new_df$Time = seq(from = start(Nile)[[1]], to = end(Nile)[[1]], by = frequency(Nile))
I have a dataset with some 100,000 tweets and their sentiment scores attached. The original dataset just has two columns one for the tweets and one for their sentiment scores.
I am trying to build a data dictionary for it using the dataMeta package. Here is the code that I have writtern so far:
#Data Dictionary
var_desc<-c("Sentiment Score 0 for Negative sentences and 4 for Positive sentences","The tweets collected")
var_type<-c(0,1)
#Creating the Linker Data Frame
linker <- build_linker(tweets_train, variable_description = var_desc, variable_type = var_type)
linker
#Build the data dictionary
dict<-build_dict(my.data = tweets_train,linker=linker,option_description = NULL, prompt_varopts = F)
kable(dict,format="html",caption="Data dictionary for the Training dataset")
My problem is in the data dictionary I have provided the Variable Name and the Variable Description but I think in the Variable Options column it is trying to print the entire 100,000 tweets which I want to avoid. Is it possible for me to set that column up too manually. Would the option_description in the build_dict function be of any help to do it?
I tried getting some idea about it from online but to no use. Here is the link that I have followed till now:
https://cran.r-project.org/web/packages/dataMeta/vignettes/dataMeta_Vignette.html
This is the first time I am trying to build a data dictionary and hence the struggle. Any suggestions would be extremely appreciated. Thanks in advance.
Hey guys i am really new to r and i am having difficulty in implementing the code i am attaching the csv file , in that csv file i need to create a table showing the average salary of males and females CSV file for the data
can you guys please me with these questions :
Q1 .
Use R to create a table showing the average salary of males and females, who were placed. Review whether there is a gender gap in the data. In other words, observe whether the average salaries of males is higher than the average salaries of females in this dataset. and also i need to run
a t-test to test the following hypothesis:
H1: The average salary of the male MBAs is higher than the average salary of female MBAs.
Please see GhostCat's comment link about asking a question. That being said, the following may help you figure out how to do what you ask.
There are a few handy functions that you may want to familiarize yourself with. To read csv files you will need to run read.csv where you can press the tab key to inform you of arguments you can enter- for example, header = TRUE which says the first row of the csv is only header information.
dat <- read.csv(file = "~\WHERE\FILENAME.csv", header = TRUE)
To save save any object as a data.frame you can use as.data.frame or data.frame functions.
df <- as.data.frame(dat)
To split a data.frame by some value into separate lists you can use the split function.
df_Gender <- split(df, df$Gender)
The best way to work on lists is to familiarize yourself with the apply family of functions (see a full and runnable explanation R Grouping functions: sapply vs. lapply vs. apply. vs. tapply vs. by vs. aggregate).
If you run into very specific trouble while working on a step please search furiously before posting a question. Best of luck.