How to run Time Series Regression with years, countries, and multiple values in R? - r

I think I know what I need to do, I just don't know how to make it work.
Example of Data:
data
I have decades of data in Excel in that format, which I uploaded to R. I believe I need to convert it to a time series or date format somehow, but retain the countries as categories so I can run the following regressions:
y ~ x1+x2
x1 ~ x2
y ~ x1
Can anyone share code/packages that can help me accomplish this? It feels simple, but I could not find any examples in a few hours of searching. Would ggplot also be recommended for producing figures with this data?
I tried converting it to as.xts, but that did not work, likely because of my poor understanding and the Country column. My failed attempt below:
modelts=as.xts(model1[,-1],order.by=as.Date(model1[,1],format='%m%d%Y'))

Your data is a great fit for a tsibble:
as_tsibble(your_df, key = "Country", index = "Year")
You can then use the wonderful tidyverts tools:
Tidy tools for time series
These use ggplot2 and dplyr.
A great guide for these tools is:
Forecasting: Principles and Practice (3rd ed)

Related

Download forecast values from tsibble (Fable)

I'm sure this is a simple question, but relatively new here. I'm trying to extract the forecasted values in a CSV/table I can use outside of R. I followed along with the multiple series example from here: https://www.mitchelloharawild.com/blog/fable/ . I'm trying to extract the 2 years forecasted data that's completed in this step:
fit %>%
forecast(h = "2 years") %>%
autoplot(tourism_state, level = NULL)
I can see the 3 models in the autoplot, but can't figure out how to get the forecasted values from the Fit tsibble. Any help is appreciated. It looks like there's quite a bit of information that can be genreated (forecast intervals, etc.), so if there's somewhere I can reference on how to parse through what all can be downloaded and how please let me know. Thanks!
The forecasted values of a fable can be saved to a csv using readr::write_csv().
When used with columns that are not in a flat format (such as forecast distributions or intervals), the values will be stored as character strings and information will be lost. Before writing to a file, you should flatten these structures by extracting their components into separate columns.
You can use unpack_hilo() to extract the lower, upper, and level values within a <hilo> to create a flat data structure. Alternatively you can access the components of a <hilo> with $, for example: my_interval$lower.

SuperLearner for survival outcome in R

I recently started reading about the SuperLearner and I am trying to run SuperLearner for survival outcome in R. I found an example code in the Targeted Learning book by Mark J. van der Laan and Sherri Rose, which require the data to be converted to long format to run.
The function that converts the data to the long format is no longer available. Here is the code:
library(survival)
data(lung)
subLung <- subset(lung, select = c(time, status, age,ph.ecog, ph.karno, pat.karno))
subLung$female <- (lung$sex - 1)
subLung <- subLung[complete.cases(subLung), ]
## Expand subLung to Long Format
longData <- SuperLearner:::createDiscrete(time =subLung$time,
event = (subLung$status == 2),dataX = subset(subLung,
select =-c(time, status)), n.delta = 30)
The createDiscrete function is no longer available in the SuperLearner package. Is there any other function that will convert the data to long format? If not, then a toy example of how to convert the data into appropriate long format would be very helpful. Or a sample R code to run SuperLearner for survival outcome would be also helpful.
I found the answer. To run SuperLearner for survival outcome, the data structure has to be converted to counting process format, meaning that, the time variable should be split in such a way that at most 1 event can happen given a time interval. The survsplit function in survival package does that! Thanks to Dr. Eric C. Polley.

Retain SPSS value labels when working with data

I am analysing student level data from PISA 2015. The data is available in SPSS format here
I can load the data into R using the read_sav function in the haven package. I need to be able to edit the data in R and then save/export the data in SPSS format with the original value labels that are included in the SPSS download intact. The code I have used is:
library(haven)
student<-read_sav("CY6_MS_CMB_STU_QQQ.sav",user_na = T)
student2<-data.frame(student)
#some edits to data
write_sav(student2,"testdata1.sav")
When my colleague (who works in SPSS) tries to open the "testdata1.sav" the value labels are missing. I've read through the haven documentation and can't seem to find a solution for this. I have also tried read/write.spss in the foreign package but have issues loading in the dataset.
I am using R version 3.4.0 and the latest build of haven.
Does anyone know if there is a solution for this? I'd be very grateful of your help. Please let me know if you require any additional information to answer this.
library(foreign)
df <- read.spss("spss_file.sav", to.data.frame = TRUE)
This may not be exactly what you are looking for, because it uses the labels as the data. So if you have an SPSS file with 0 for "Male" and 1 for "Female," you will have a df with values that are all Males and Females. It gets you one step further, but perhaps isn't the whole solution. I'm working on the same problem and will let you know what else I find.
library ("sjlabelled")
student <- sjlabelled::read_spss("CY6_MS_CMB_STU_QQQ.sav")
student2 <-student
write_spss(student2,"testdata1.sav")
I did not try and hope it works. The sjlabelled package is good with non-ascii-characters as German Umlaute.
But keep in mind, that R saves the labels as attributes. These attributes are lost, when doing some data transformations (as subsetting data for example). When lost in R they won't show up in SPSS of course. The sjlabelled::copy_labels function is helpful in those cases:
student2 <- copy_labels(student2, student) #after data transformations and before export to spss
I think you need to recover the value labels in the dataframe after importing dataset into R. Then write the that dataframe into sav file.
#load library
libray(haven)
# load dataset
student<-read_sav("CY6_MS_CMB_STU_QQQ.sav",user_na = T)
#map to find class of each columns
map_dataset<-map(student, function(x)attr(x, "class"))
#Run for loop to identify all Factors with haven-labelled
factor_variable<-c()
for(i in 1:length(map_dataset)){
if(map_dataset[i]!="NULL"){
name<-names(map_dataset[i])
factor_variable<-c(factor_variable,name)
}
}
#convert all haven labelled variables into factor
student2<-student %>%
mutate_at(vars(factor_variable), as_factor)
#write dataset
write_sav(student2, "testdata1.sav")

R programming table creation

Hey guys i am really new to r and i am having difficulty in implementing the code i am attaching the csv file , in that csv file i need to create a table showing the average salary of males and females CSV file for the data
can you guys please me with these questions :
Q1 .
Use R to create a table showing the average salary of males and females, who were placed. Review whether there is a gender gap in the data. In other words, observe whether the average salaries of males is higher than the average salaries of females in this dataset. and also i need to run
a t-test to test the following hypothesis:
H1: The average salary of the male MBAs is higher than the average salary of female MBAs.
Please see GhostCat's comment link about asking a question. That being said, the following may help you figure out how to do what you ask.
There are a few handy functions that you may want to familiarize yourself with. To read csv files you will need to run read.csv where you can press the tab key to inform you of arguments you can enter- for example, header = TRUE which says the first row of the csv is only header information.
dat <- read.csv(file = "~\WHERE\FILENAME.csv", header = TRUE)
To save save any object as a data.frame you can use as.data.frame or data.frame functions.
df <- as.data.frame(dat)
To split a data.frame by some value into separate lists you can use the split function.
df_Gender <- split(df, df$Gender)
The best way to work on lists is to familiarize yourself with the apply family of functions (see a full and runnable explanation R Grouping functions: sapply vs. lapply vs. apply. vs. tapply vs. by vs. aggregate).
If you run into very specific trouble while working on a step please search furiously before posting a question. Best of luck.

Transform a matrix txt file in spectra data for ChemoSpec package

I want to use ChemoSpec with a mass spectra of about 60'000 datapoint.
I have them already in one txt file as a matrix (X + 90 samples = 91 columns; 60'000 rows).
How may I adapt this file as spectra data without exporting again each single file in csv format (which is quite long in R given the size of my data)?
The typical (and only?) way to import data into ChemoSpec is by way of the getManyCsv() function, which as the question indicates requires one CSV file for each sample.
Creating 90 CSV files from the 91 columns - 60,000 rows file described, may be somewhat slow and tedious in R, but could be done with a standalone application, whether existing utility or some ad-hoc script.
An R-only solution would be to create a new method, say getOneBigCsv(), adapted from getManyCsv(). After all, the logic of getManyCsv() is relatively straight forward.
Don't expect such a solution to be sizzling fast, but it should, in any case, compare with the time it takes to run getManyCsv() and avoid having to create and manage the many files, hence overall be faster and certainly less messy.
Sorry I missed your question 2 days ago. I'm the author of ChemoSpec - always feel free to write directly to me in addition to posting somewhere.
The solution is straightforward. You already have your data in a matrix (after you read it in with >read.csv("file.txt"). So you can use it to manually create a Spectra object. In the R console type ?Spectra to see the structure of a Spectra object, which is a list with specific entries. You will need to put your X column (which I assume is mass) into the freq slot. Then the rest of the data matrix will go into the data slot. Then manually create the other needed entries (making sure the data types are correct). Finally, assign the Spectra class to your completed list by doing something like >class(my.spectra) <- "Spectra" and you should be good to go. I can give you more details on or off list if you describe your data a bit more fully. Perhaps you have already solved the problem?
By the way, ChemoSpec is totally untested with MS data, but I'd love to find out how it works for you. There may be some changes that would be helpful so I hope you'll send me feedback.
Good Luck, and let me know how else I can help.
many years passed and I am not sure if anybody is still interested in this topic. But I had the same problem and did a little workaround to convert my data to class 'Spectra' by extracting the information from the data itself:
#Assumption:
# Data is stored as a numeric data.frame with column names presenting samples
# and row names including domain axis
dataframe2Spectra <- function(Spectrum_df,
freq = as.numeric(rownames(Spectrum_df)),
data = as.matrix(t(Spectrum_df)),
names = paste("YourFileDescription", 1:dim(Spectrum_df)[2]),
groups = rep(factor("Factor"), dim(Spectrum_df)[2]),
colors = rainbow(dim(Spectrum_df)[2]),
sym = 1:dim(Spectrum_df)[2],
alt.sym = letters[1:dim(Spectrum_df)[2]],
unit = c("a.u.", "Domain"),
desc = "Some signal. Describe it with 'desc'"){
features <- c("freq", "data", "names", "groups", "colors", "sym", "alt.sym", "unit", "desc")
Spectrum_chem <- vector("list", length(features))
names(Spectrum_chem) <- features
Spectrum_chem$freq <- freq
Spectrum_chem$data <- data
Spectrum_chem$names <- names
Spectrum_chem$groups <- groups
Spectrum_chem$colors <- colors
Spectrum_chem$sym <- sym
Spectrum_chem$alt.sym <- alt.sym
Spectrum_chem$unit <- unit
Spectrum_chem$desc <- desc
# important step
class(Spectrum_chem) <- "Spectra"
# some warnings
if (length(freq)!=dim(data)[2]) print("Dimension of data is NOT #samples X length of freq")
if (length(names)>dim(data)[1]) print("Too many names")
if (length(names)<dim(data)[1]) print("Too less names")
if (length(groups)>dim(data)[1]) print("Too many groups")
if (length(groups)<dim(data)[1]) print("Too less groups")
if (length(colors)>dim(data)[1]) print("Too many colors")
if (length(colors)<dim(data)[1]) print("Too less colors")
if (is.matrix(data)==F) print("'data' is not a matrix or it's not numeric")
return(Spectrum_chem)
}
Spectrum_chem <- dataframe2Spectra(Spectrum)
chkSpectra(Spectrum_chem)

Resources