I think I know what I need to do, I just don't know how to make it work.
Example of Data:
data
I have decades of data in Excel in that format, which I uploaded to R. I believe I need to convert it to a time series or date format somehow, but retain the countries as categories so I can run the following regressions:
y ~ x1+x2
x1 ~ x2
y ~ x1
Can anyone share code/packages that can help me accomplish this? It feels simple, but I could not find any examples in a few hours of searching. Would ggplot also be recommended for producing figures with this data?
I tried converting it to as.xts, but that did not work, likely because of my poor understanding and the Country column. My failed attempt below:
modelts=as.xts(model1[,-1],order.by=as.Date(model1[,1],format='%m%d%Y'))
Your data is a great fit for a tsibble:
as_tsibble(your_df, key = "Country", index = "Year")
You can then use the wonderful tidyverts tools:
Tidy tools for time series
These use ggplot2 and dplyr.
A great guide for these tools is:
Forecasting: Principles and Practice (3rd ed)
I know there are number of questions similar to this here but 1) most of the solutions rely on deprecated functions like ml_create_dummy_variables and 2) other solutions are incomplete.
Is there a function or an approach to easily hot encode a categorical variable into multiple dummy variables in sparklyr?
This post asks for a solution in SparkR, incidentally a sparklyr solution is given that only works when the categories are unique in a given column, which renders its pointless.
This solution, results in a single dummy instead of a dummy for each category (grabs the first category). This is also the solution I stumbled onto (based on this post), which does not cut it:
iris_sdf <- copy_to(sc, iris, overwrite = TRUE)
iris_sdf %>%
ft_string_indexer(input_col = "Species", output_col = "species_num") %>%
mutate(cat_num = species_num + 1) %>%
ft_one_hot_encoder("species_num", "species_dum") %>%
ft_vector_assembler(c("species_dum"))
I'm looking for a solution that will take Species from the iris dataset and generate three columns -one for each category in Species (virginica, setosa, and versicolor). Using R, fastDummies package has what I need, but I'm left wondering how to achieve similar functionality in sparklyr.
Again, I'll note that ml_create_dummy_variables (suggested by this post) produced the following error:
Error in ml_create_dummy_variables(., "species_num", "species_dum") : Error in ml_create_dummy_variables(., "species_num", "species_dum") :
could not find function "ml_create_dummy_variables"
Note: I'm using sparklyr_1.3.1
I am analysing student level data from PISA 2015. The data is available in SPSS format here
I can load the data into R using the read_sav function in the haven package. I need to be able to edit the data in R and then save/export the data in SPSS format with the original value labels that are included in the SPSS download intact. The code I have used is:
library(haven)
student<-read_sav("CY6_MS_CMB_STU_QQQ.sav",user_na = T)
student2<-data.frame(student)
#some edits to data
write_sav(student2,"testdata1.sav")
When my colleague (who works in SPSS) tries to open the "testdata1.sav" the value labels are missing. I've read through the haven documentation and can't seem to find a solution for this. I have also tried read/write.spss in the foreign package but have issues loading in the dataset.
I am using R version 3.4.0 and the latest build of haven.
Does anyone know if there is a solution for this? I'd be very grateful of your help. Please let me know if you require any additional information to answer this.
library(foreign)
df <- read.spss("spss_file.sav", to.data.frame = TRUE)
This may not be exactly what you are looking for, because it uses the labels as the data. So if you have an SPSS file with 0 for "Male" and 1 for "Female," you will have a df with values that are all Males and Females. It gets you one step further, but perhaps isn't the whole solution. I'm working on the same problem and will let you know what else I find.
library ("sjlabelled")
student <- sjlabelled::read_spss("CY6_MS_CMB_STU_QQQ.sav")
student2 <-student
write_spss(student2,"testdata1.sav")
I did not try and hope it works. The sjlabelled package is good with non-ascii-characters as German Umlaute.
But keep in mind, that R saves the labels as attributes. These attributes are lost, when doing some data transformations (as subsetting data for example). When lost in R they won't show up in SPSS of course. The sjlabelled::copy_labels function is helpful in those cases:
student2 <- copy_labels(student2, student) #after data transformations and before export to spss
I think you need to recover the value labels in the dataframe after importing dataset into R. Then write the that dataframe into sav file.
#load library
libray(haven)
# load dataset
student<-read_sav("CY6_MS_CMB_STU_QQQ.sav",user_na = T)
#map to find class of each columns
map_dataset<-map(student, function(x)attr(x, "class"))
#Run for loop to identify all Factors with haven-labelled
factor_variable<-c()
for(i in 1:length(map_dataset)){
if(map_dataset[i]!="NULL"){
name<-names(map_dataset[i])
factor_variable<-c(factor_variable,name)
}
}
#convert all haven labelled variables into factor
student2<-student %>%
mutate_at(vars(factor_variable), as_factor)
#write dataset
write_sav(student2, "testdata1.sav")
I'm using R Sweave and wanted to begin my document with showing a sample of my table. My problem is, that my table has 39 variables and many rows. For the rows it isn't a problem, I can take only a few ones using sample_n, but I need to habe all my variables visible. It would sadly not fit either on a landscape sheet. I'm using xtable to generate my table. I think the easier way would be to put so much variables as possible on the sheet, then begin with the rest under, and so on, until it is all displayed.
Here some minimalist exemple:
dat <- bind_cols(mtcars, mtcars, mtcars, mtcars)
a <- as.data.frame(dat) %>%
sample_n(5)
print(xtable(a))
I've already know the longtable function, but it would only help me if I had too much rows, and not too much columns, isn't it? I'm still a little bit lost with having at the same time R and LaTeX on the same file...
An answer using my huxtable package. Create the table, then break it up by columns:
library(huxtable)
dat <- sample_n(as.data.frame(bind_cols(mtcars, mtcars, mtcars, mtcars)), 5)
ht <- as_hux(dat, add_colnames = TRUE)
# now format to taste:
bold(ht)[1,] <- TRUE
ht[,1:5] # first 5 columns. Will print as LaTeX within a Rmarkdown document
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
On general request, a community wiki on producing latex tables in R. In this post I'll give an overview of the most commonly used packages and blogs with code for producing latex tables from less straight-forward objects. Please feel free to add any I missed, and/or give tips, hints and little tricks on how to produce nicely formatted latex tables with R.
Packages :
xtable : for standard tables of most simple objects. A nice gallery with examples can be found here.
memisc : tool for management of survey data, contains some tools for latex tables of (basic) regression model estimates.
Hmisc contains a function latex() that creates a tex file containing the object of choice. It is pretty flexible, and can also output longtable latex tables. There's a lot of info in the help file ?latex
miscFuncs has a neat function 'latextable' that converts matrix data with mixed alphabetic and numeric entries into a LaTeX table and prints them to the console, so they can be copied and pasted into a LaTeX document.
texreg package (JSS paper) converts statistical model output into LaTeX tables. Merges multiple models. Can cope with about 50 different model types, including network models and multilevel models (lme and lme4).
reporttools package (JSS paper) is another option for descriptive statistics on continuous, categorical and date variables.
tables package is perhaps the most general LaTeX table making package in R for descriptive statistics
stargazer package makes nice comparative statistical model summary tables
Blogs and code snippets
There is the outreg function of Paul Johnson that gives Stata-like tables in Latex for the output of regressions. This one works great.
As given in an earlier question, there's a code snippet to adapt the memisc package for lme4 objects.
Related questions :
Suggestion for R/LaTeX table creation package
Rreport/LaTeX quality output package
sorting a table for latex output with xtable
Any way to produce a LaTeX table from an lme4 mer model fit object?
R data.frame with stacked specified titles for latex output with xtable
Automating adding tables fast to latex from R, with a very flexible and interesting syntax using the formula language
I'd like to add a mention of the "brew" package. You can write a brew template file which would be LaTeX with placeholders, and then "brew" it up to create a .tex file to \include or \input into your LaTeX. Something like:
\begin{tabular}{l l}
A & <%= fit$A %> \\
B & <%= fit$B %> \\
\end{tabular}
The brew syntax can also handle loops, so you can create a table row for each row of a dataframe.
Thanks Joris for creating this question. Hopefully, it will be made into a community wiki.
The booktabs packages in latex produces nice looking tables. Here is a blog post on how to use xtable to create latex tables that use booktabs
I would also add the apsrtable package to the mix as it produces nice looking regression tables.
Another Idea: Some of these packages (esp. memisc and apsrtable) allow easy extensions of the code to produce tables for different regression objects. One such example is the lme4 memisc code shown in the question. It might make sense to start a github repository to collect such code snippets, and over time maybe even add it to the memisc package. Any takers?
The stargazer package is another good option. It supports objects from many commonly used functions and packages (lm, glm, svyreg, survival, pscl, AER), as well as from zelig. In addition to regression tables, it can also output summary statistics for data frames, or directly output the content of data frames.
I have a few tricks and work arounds to interesting 'features' of xtable and Latex that I'll share here.
Trick #1: Removing Duplicates in Columns and Trick #2: Using Booktabs
First, load packages and define my clean function
<<label=first, include=FALSE, echo=FALSE>>=
library(xtable)
library(plyr)
cleanf <- function(x){
oldx <- c(FALSE, x[-1]==x[-length(x)])
# is the value equal to the previous?
res <- x
res[oldx] <- NA
return(res)}
Now generate some fake data
data<-data.frame(animal=sample(c("elephant", "dog", "cat", "fish", "snake"), 100,replace=TRUE),
colour=sample(c("red", "blue", "green", "yellow"), 100,replace=TRUE),
size=rnorm(100,mean=500, sd=150),
age=rlnorm(100, meanlog=3, sdlog=0.5))
#generate a table
datatable<-ddply(data, .(animal, colour), function(df) {
return(data.frame(size=mean(df$size), age=mean(df$age)))
})
Now we can generate a table, and use the clean function to remove duplicate entries in the label columns.
cleandata<-datatable
cleandata$animal<-cleanf(cleandata$animal)
cleandata$colour<-cleanf(cleandata$colour)
#
this is a normal xtable
<<label=normal, results=tex, echo=FALSE>>=
print(
xtable(
datatable
),
tabular.environment='longtable',
latex.environments=c("center"),
floating=FALSE,
include.rownames=FALSE
)
#
this is a normal xtable where a custom function has turned duplicates to NA
<<label=cleandata, results=tex, echo=FALSE>>=
print(
xtable(
cleandata
),
tabular.environment='longtable',
latex.environments=c("center"),
floating=FALSE,
include.rownames=FALSE
)
#
This table uses the booktab package (and needs a \usepackage{booktabs} in the headers)
\begin{table}[!h]
\centering
\caption{table using booktabs.}
\label{tab:mytable}
<<label=booktabs, echo=F,results=tex>>=
mat <- xtable(cleandata,digits=rep(2,ncol(cleandata)+1))
foo<-0:(length(mat$animal))
bar<-foo[!is.na(mat$animal)]
print(mat,
sanitize.text.function = function(x){x},
floating=FALSE,
include.rownames=FALSE,
hline.after=NULL,
add.to.row=list(pos=list(-1,bar,nrow(mat)),
command=c("\\toprule ", "\\midrule ", "\\bottomrule ")))
#could extend this with \cmidrule to have a partial line over
#a sub category column and \addlinespace to add space before a total row
#
Two utilities in package taRifx can be used in concert to produce multi-row tables of nested heirarchies.
library(datasets)
library(taRifx)
library(xtable)
test.by <- bytable(ChickWeight$weight, list( ChickWeight$Chick, ChickWeight$Diet) )
colnames(test.by) <- c('Diet','Chick','Mean Weight')
print(latex.table.by(test.by), include.rownames = FALSE, include.colnames = TRUE, sanitize.text.function = force)
# then add \usepackage{multirow} to the preamble of your LaTeX document
# for longtable support, add ,tabular.environment='longtable' to the print command (plus add in ,floating=FALSE), then \usepackage{longtable} to the LaTeX preamble
... and Trick #3 Multiline entries in an Xtable
Generate some more data
moredata<-data.frame(Nominal=c(1:5), n=rep(5,5),
MeanLinBias=signif(rnorm(5, mean=0, sd=10), digits=4),
LinCI=paste("(",signif(rnorm(5,mean=-2, sd=5), digits=4),
", ", signif(rnorm(5, mean=2, sd=5), digits=4),")",sep=""),
MeanQuadBias=signif(rnorm(5, mean=0, sd=10), digits=4),
QuadCI=paste("(",signif(rnorm(5,mean=-2, sd=5), digits=4),
", ", signif(rnorm(5, mean=2, sd=5), digits=4),")",sep=""))
names(moredata)<-c("Nominal", "n","Linear Model \nBias","Linear \nCI", "Quadratic Model \nBias", "Quadratic \nCI")
Now produce our xtable, using the sanitize function to replace column names with the correct Latex newline commands (including double backslashes so R is happy)
<<label=multilinetable, results=tex, echo=FALSE>>=
foo<-xtable(moredata)
align(foo) <- c( rep('c',3),'p{1.8in}','p{2in}','p{1.8in}','p{2in}' )
print(foo,
floating=FALSE,
include.rownames=FALSE,
sanitize.text.function = function(str) {
str<-gsub("\n","\\\\", str, fixed=TRUE)
return(str)
},
sanitize.colnames.function = function(str) {
str<-c("Nominal", "n","\\centering Linear Model\\\\ \\% Bias","\\centering Linear \\\\ 95\\%CI", "\\centering Quadratic Model\\\\ \\%Bias", "\\centering Quadratic \\\\ 95\\%CI \\tabularnewline")
return(str)
})
#
(although this isn't perfect, as we need \tabularnewline so the table is formatted correctly, and Xtable still puts in a final \, so we end up with a blank line below the table header.)
You can also use the latextable function from the R package micsFuncs:
http://cran.r-project.org/web/packages/miscFuncs/index.html
latextable(M) where M is a matrix with mixed alphabetic and numeric entries outputs a basic LaTeX table onto screen, which can be copied and pasted into a LaTeX document. Where there are small numbers, it also replaces these with index notation (eg 1.2x10^{-3}).
Another R package for aggregating multiple regression models into LaTeX tables is texreg.