texreg: How to save space in htmlreg-regression tables? - r

Is there a way to reduce the vertical size of a htmlreg-table? I have severeal modells with about 10 or more IV. So atm I need an entire page to present my regressions results. I would like to save some lines by reporting SD or SE (in parenthesis) inline (next to) the coefficients. Straightforward way is creating output-tables in latex by hand. Is there an easy solution (more elegant way)?
library(texreg)
alligator = data.frame(
lnLength = c(3.87, 3.61, 4.33, 3.43, 3.81, 3.83, 3.46, 3.76,
3.50, 3.58, 4.19, 3.78, 3.71, 3.73, 3.78),
lnWeight = c(4.87, 3.93, 6.46, 3.33, 4.38, 4.70, 3.50, 4.50,
3.58, 3.64, 5.90, 4.43, 4.38, 4.42, 4.25)
)
alli.mod = lm(lnWeight ~ lnLength, data = alligator)
htmlreg(list(alli.mod),
file="MWE_regression.html",
caption="MWE Regression",
caption.above = TRUE,
include.rs=TRUE,
include.adjrs = FALSE,
digits=3,
stars=c(0.01, 0.05, 0.1)
)
Thanks :)
Update The amazing, simple and elegant solution is using the stargazer-package. Quite new: http://www.r-statistics.com/2013/01/stargazer-package-for-beautiful-latex-tables-from-r-statistical-models-output/ this package can export wonderful latex-tables, much much better than the texreg.

If you'd still like to accomplish this with texreg and its htmlreg function, just use the argument single.row = TRUE. Here is your full example:
library(texreg)
alligator = data.frame(
lnLength = c(3.87, 3.61, 4.33, 3.43, 3.81, 3.83, 3.46, 3.76,
3.50, 3.58, 4.19, 3.78, 3.71, 3.73, 3.78),
lnWeight = c(4.87, 3.93, 6.46, 3.33, 4.38, 4.70, 3.50, 4.50,
3.58, 3.64, 5.90, 4.43, 4.38, 4.42, 4.25)
)
alli.mod = lm(lnWeight ~ lnLength, data = alligator)
htmlreg(list(alli.mod),
single.row = TRUE,
file="MWE_regression.html",
caption="MWE Regression",
caption.above = TRUE,
include.rs=TRUE,
include.adjrs = FALSE,
digits=3,
stars=c(0.01, 0.05, 0.1)
)
Your original result is on the left, the new output on the right:
Use the texreg function instead of htmlreg if you are interested in LaTeX output rather than HTML, as mentioned in your additional comment.
Edit: The HTML output now looks a bit nicer with more recent versions of texreg.

Related

Frequency table for intervals

I saved data into the object datos so I could calculate AF (absolute frequency) and RF(relative frequency) for a continuous variable in column V1. But I want to have the frequencies be in intervals.
I don't really know how to do it so I need your help. If anyone has any idea about how to do it, here is my code:
k is the number of intervals I'm using
and largo is the quantity of data I have.
read.table("datos.txt", header = FALSE)-> datos
largo<-length(datos$V1)
k<- (1+log2(largo))
k<-round(k,digits = 0)
vectordatos <- datos$v1
histograma<-hist(datos$V1,breaks=k)
FA<-table(datos$V1)
FR<-table(datos$V1)/largo
FA
FR
The datos object is as follows:
datos = structure(list(V1 = c(6.16, 5.83, 5.66, 3.63, 1.38, 9.64, 7.46,
5.34, 7.93, 8.5, 4.18, 5.18, 10.27, 5.41, 4.76, 4.67, 10.02,
7.1, 5.38, 8.55, 4.85, 8.28, 2.9, 7.18, 6.54, 5.66, 7.26, 6.45,
3.97, 6.55, 5.15, 7.83, 5.52, 7.21, 7.3, 6.19)), class = "data.frame", row .names = c(NA,
-36L))
You can use cut to create k intervals and table to represent the frequency per interval. You can use the following code:
table(cut(datos$V1,k))
Output:
(1.37,2.86] (2.86,4.34] (4.34,5.83] (5.83,7.31] (7.31,8.79] (8.79,10.3]
1 4 11 11 6 3

In which cases should i use the standard deviatiation "MR" and "SD" in a x.one.bar chart

I cannot really understand the real usage of those types of standard deviation and the information that i get in the qcc manual is sometimes confused. I used the example provided in the qcc manual but i don't know for which one i choose and what is the reason for that choice.
I thank a lot for any support.
`# Water content of antifreeze data (Wetherill and Brown, 1991, p. 120)
x <- c(2.23, 2.53, 2.62, 2.63, 2.58, 2.44, 2.49, 2.34, 2.95, 2.54, 2.60, 2.45,
2.17, 2.58, 2.57, 2.44, 2.38, 2.23, 2.23, 2.54, 2.66, 2.84, 2.81, 2.39,
2.56, 2.70, 3.00, 2.81, 2.77, 2.89, 2.54, 2.98, 2.35, 2.53)
# the Shewhart control chart for one-at-time data
# 1) using MR (default)
qcc(x, type="xbar.one", data.name="Water content (in ppm) of batches of antifreeze")
# 2) using SD
qcc(x, type="xbar.one", std.dev = "SD", data.name="Water content (in ppm) of batches of antifreeze")`

plotting threshold/piecewise/change point models with 95% confidence intervals in R

I would like to plot a threshold model with smooth 95% confidence interval lines between line segments. You would think this would be on the simple side but I have not been able to find an answer!
My threshold/breakpoints are known, it would be great if there were a way to visualize this data. I have tried the segmented package which produces the following plot:
The plot shows a threshold model with a breakpoint at 5.4. However, the confidence intervals are not smooth between regression lines.
If anyone knows of any way to produce smooth (i.e. without the jump between line segments) CI lines between segmented regression lines (ideally in ggplot) that would be amazing. Thank you so much.
I have included sample data and the code I have tried below:
x <- c(2.26, 1.95, 1.59, 1.81, 2.01, 1.63, 1.62, 1.19, 1.41, 1.35, 1.32, 1.52, 1.10, 1.12, 1.11, 1.14, 1.23, 1.05, 0.95, 1.30, 0.79,
0.81, 1.15, 1.10, 1.29, 0.97, 1.05, 1.05, 0.84, 0.64, 0.80, 0.81, 0.61, 0.71, 0.75, 0.30, 0.30, 0.49, 1.13, 0.55, 0.77, 0.51,
0.67, 0.43, 1.11, 0.29, 0.36, 0.57, 0.02, 0.22, 3.18, 3.79, 2.49, 2.44, 2.12, 2.45, 3.22, 3.44, 3.86, 3.53, 3.13)
y <- c(22.37, 18.93, 16.99, 15.65, 14.62, 13.79, 13.09, 12.49, 11.95, 11.48, 11.05, 10.66, 10.30, 9.96, 9.65, 9.35, 9.07, 8.81,
8.56, 8.32, 8.09, 7.87, 7.65, 7.45, 7.25, 7.05, 6.86, 6.68, 6.50, 6.32, 6.15, 5.97, 5.80, 5.63, 5.47, 5.30,
5.13, 4.96, 4.80, 4.63, 4.45, 4.28, 4.09, 3.90, 3.71, 3.50, 3.27, 3.01, 2.70, 2.28, 22.37, 16.99, 11.05, 8.81,
8.56, 8.32, 7.25, 7.05, 6.50, 6.15, 5.63)
lin.mod <- lm(y ~ x)
segmented.mod <- segmented(lin.mod, seg.Z = ~x, psi=2)
plot(x, y)
plot(segmented.mod, add=TRUE, conf.level = 0.95)
which produces the following plot (and associated jumps in 95% confidence intervals):
segmented plot
Background: The non-smoothness in existing change point packages are due to the fact that frequentist packages operate with a fixed change point value. But as with all inferred parameters, this is wrong because there is indeed uncertainty concerning the location of the change.
Solution: AFAIK, only Bayesian methods can quantify that and the mcp package fills this space.
library(mcp)
model = list(
y ~ 1 + x, # Segment 1: Intercept and slope
~ 0 + x # Segment 2: Joined slope (no intercept change)
)
fit = mcp(model, data = data.frame(x, y))
Default plot (plot.mcpfit() returns a ggplot object):
plot(fit) + ggtitle("Default plot")
Each line represents a possible model that generated the data. The posterior for the change point is shown as a blue density. You can add a credible interval on top using plot(fit, q_fit = TRUE) or plot it alone:
plot(fit, lines = 0, q_fit = c(0.025, 0.975), cp_dens = FALSE) + ggtitle("Credible interval only")
If your change point is indeed known and if you want to model different residual scales for each segment (i.e., quasi-emulate segmented), you can do:
model2 = list(
y ~ 1 + x,
~ 0 + x + sigma(1) # Add intercept change in residual scale
)
fit = mcp(model2, df, prior = list(cp_1 = 1.9)) # Note: prior is a fixed value - not a distribution.
plot(fit, q_fit = TRUE, cp_dens = FALSE)
Notice that the CI does not "jump" around the change point as in segmented. I believe that this is the correct behavior. Disclosure: I am the author of mcp.

Creating a 2D-grid or raster in R comparing all respondents with all variables

reproducible example for my data:
df_1 <- data.frame(cbind("Thriving" = c(2.33, 4.21, 6.37, 5.28, 4.87, 3.92, 4.16, 5.53), "Satisfaction" = c(3.45, 4.53, 6.01, 3.87, 2.92, 4.50, 5.89, 4.72), "Wellbeing" = c(2.82, 3.45, 5.23, 3.93, 6.18, 4.22, 3.68, 4.74), "id" = c(1:8)))
As you can see, it includes three variables of psychological measures and one identifier with an id for each respondent.
Now, my aim is to create a 2D-grid with which I can have a nice overview of all the values for all respondents concerning each of the variables. So on the x-axis I would have the id of all the respondents and on the y-axis all variables, whereas the colour of the particular field depends on the value - 1 to 3 in red, 3 to 5 in yellow and 5 to 7 in green The style of the grid should be like this image.
All I have achieved so far is the following code which compresses all the variables/items into one column so they can together be portrayed on the y-axis - the id is of course included in its own column as are the values:
df_1 %>%
select("Thr" = Thriving, "Stf" = Satisfaction, "Wb" = Wellbeing, "id" = id) %>%
na.omit %>%
gather(key = "variable", value = "value", -id) %>%
I am looking for a solution that works without storing the data in a new frame.
Also, I am looking for a solution that would be useful for even 100 or more respondents and up to about 40 variables. It would not matter if one rectangle would then be very small, I just want to have a nice colour play which would give a nice taste of where an organisation may be achieving low or high - and how it is achieving in general.
Thanks for reading, very grateful for any help!
There is probably a better graphics oriented approach, but you can do this with base plot and by treating your data as a raster:
library(raster)
df_1 <- cbind("Thriving" = c(2.33, 4.21, 6.37, 5.28, 4.87, 3.92, 4.16, 5.53), "Satisfaction" = c(3.45, 4.53, 6.01, 3.87, 2.92, 4.50, 5.89, 4.72), "Wellbeing" = c(2.82, 3.45, 5.23, 3.93, 6.18, 4.22, 3.68, 4.74), "id" = c(1:8))
r <- raster(ncol=nrow(df_1), nrow=3, xmn=0, xmx=8, ymn=0, ymx=3)
values(r) <- as.vector(as.matrix(df_1[,1:3]))
plot(r, axes=F, box=F, asp=NA)
axis(1, at=seq(-0.5, 8.5, 1), 0:9)
axis(2, at=seq(-0.5, 3.5, 1), c("", colnames(df_1)), las=1)

How to set different colors in different ranges of one single line in R?

I am now facing on a problem about how to make moving average crossover plot in R. I added ma5 and ma20 as two moving average plots base on my price data.
It is my sample code here..
library("TTR")
library(ggplot2)
price<- c(3.23, 3.29, 3.29 , 3.21, 3.19, 3.18, 3.11, 3.21, 3.25,
3.40, 3.39, 3.28, 3.31 , 3.32, 3.21, 3.19, 3.16, 3.20,
3.26, 3.30, 3.42, 3.44, 3.40, 3.41, 3.59, 3.83, 3.70,
3.86, 3.95, 3.89, 3.94, 3.78, 3.69, 3.74, 3.67, 3.69,
3.69, 3.61, 3.64, 3.83, 3.88, 3.98, 3.98, 3.86, 3.87,
3.93, 4.05, 3.97, 3.90, 3.93, 4.00, 3.85, 3.81, 4.20,
4.17, 4.05, 3.95, 3.96, 3.97, 3.96, 3.88, 3.85, 3.79,
3.83, 3.68, 3.72, 3.73, 3.81, 3.80, 3.81, 3.75, 3.87,
3.90, 3.89, 3.86, 3.81, 3.86, 3.78, 3.83, 3.87, 3.91,
4.05, 4.07, 4.02, 4.01, 4.00, 4.13, 4.07, 4.11, 4.26,
4.33, 4.32, 4.39, 4.30, 4.39, 4.68, 4.69, 4.70, 4.60,
4.71, 4.81, 4.73, 4.78, 4.64, 4.64, 4.64, 4.61, 4.44)
date<- c("2004-01-23", "2004-01-26", "2004-01-27", "2004-01-28",
"2004-02-02", "2004-02-03", "2004-02-04", "2004-02-05",
"2004-02-06", "2004-02-11", "2004-02-12", "2004-02-13",
"2004-02-17", "2004-02-18", "2004-02-19", "2004-02-20",
"2004-02-23", "2004-02-24", "2004-02-25", "2004-02-26",
"2004-02-27", "2004-03-01", "2004-03-02", "2004-03-03",
"2004-03-04", "2004-03-05", "2004-03-08", "2004-03-09",
"2004-03-10", "2004-03-11", "2004-03-12", "2004-03-15",
"2004-03-16", "2004-03-17", "2004-03-18", "2004-03-19",
"2004-03-22", "2004-03-23", "2004-03-24", "2004-03-25",
"2004-03-26", "2004-03-29", "2004-03-30", "2004-03-31",
"2004-04-01", "2004-04-02", "2004-04-05", "2004-04-06",
"2004-04-07", "2004-04-08", "2004-04-12", "2004-04-13",
"2004-04-14", "2004-04-15", "2004-04-16", "2004-04-19",
"2004-04-20", "2004-04-21", "2004-04-22", "2004-04-23",
"2004-04-26", "2004-04-27", "2004-04-28", "2004-04-29",
"2004-04-30", "2004-05-03", "2004-05-04", "2004-05-05",
"2004-05-06", "2004-05-07", "2004-05-10", "2004-05-11",
"2004-05-12", "2004-05-13", "2004-05-14", "2004-05-17",
"2004-05-18", "2004-05-19", "2004-05-20", "2004-05-21",
"2004-05-24", "2004-05-25", "2004-05-26", "2004-05-27",
"2004-05-28", "2004-06-01", "2004-06-02", "2004-06-03",
"2004-06-04", "2004-06-07", "2004-06-08", "2004-06-09",
"2004-06-10", "2004-06-14", "2004-06-15", "2004-06-16",
"2004-06-17", "2004-06-18", "2004-06-21", "2004-06-22",
"2004-06-23", "2004-06-24", "2004-06-25", "2004-06-28",
"2004-06-29", "2004-06-30", "2004-07-01", "2004-07-02")
price5<- SMA(price,n=5)
price20<- SMA(price,n=20)
pricedf<- data.frame(date,price5,price20,price)
ggplot(pricedf,aes(date))+geom_line(group=1,aes(y=price5,colour="ma5"))+geom_line(group=1,aes(y=price20,colour="ma20"))+xlab("Date")+ylab("Price")
There are a couples of crossovers on this plot. What I want to have is when ma5 above ma20 mark as green line on 'price'(one feature in my pricedf) plot. On the other hand when ma5 under ma20 mark as red line on 'price' plot.
The example plot looks like this picture,
I was thinking subtract price5 to price20 and compare whether the values are greater than 0. But how can I draw them on another plot with different colors?
Here is how I solved it.
library("TTR")
library(ggplot2)
price<- c(3.23, 3.29, 3.29 , 3.21, 3.19, 3.18, 3.11, 3.21, 3.25,
3.40, 3.39, 3.28, 3.31 , 3.32, 3.21, 3.19, 3.16, 3.20,
3.26, 3.30, 3.42, 3.44, 3.40, 3.41, 3.59, 3.83, 3.70,
3.86, 3.95, 3.89, 3.94, 3.78, 3.69, 3.74, 3.67, 3.69,
3.69, 3.61, 3.64, 3.83, 3.88, 3.98, 3.98, 3.86, 3.87,
3.93, 4.05, 3.97, 3.90, 3.93, 4.00, 3.85, 3.81, 4.20,
4.17, 4.05, 3.95, 3.96, 3.97, 3.96, 3.88, 3.85, 3.79,
3.83, 3.68, 3.72, 3.73, 3.81, 3.80, 3.81, 3.75, 3.87,
3.90, 3.89, 3.86, 3.81, 3.86, 3.78, 3.83, 3.87, 3.91,
4.05, 4.07, 4.02, 4.01, 4.00, 4.13, 4.07, 4.11, 4.26,
4.33, 4.32, 4.39, 4.30, 4.39, 4.68, 4.69, 4.70, 4.60,
4.71, 4.81, 4.73, 4.78, 4.64, 4.64, 4.64, 4.61, 4.44)
date<- c("2004-01-23", "2004-01-26", "2004-01-27", "2004-01-28",
"2004-02-02", "2004-02-03", "2004-02-04", "2004-02-05",
"2004-02-06", "2004-02-11", "2004-02-12", "2004-02-13",
"2004-02-17", "2004-02-18", "2004-02-19", "2004-02-20",
"2004-02-23", "2004-02-24", "2004-02-25", "2004-02-26",
"2004-02-27", "2004-03-01", "2004-03-02", "2004-03-03",
"2004-03-04", "2004-03-05", "2004-03-08", "2004-03-09",
"2004-03-10", "2004-03-11", "2004-03-12", "2004-03-15",
"2004-03-16", "2004-03-17", "2004-03-18", "2004-03-19",
"2004-03-22", "2004-03-23", "2004-03-24", "2004-03-25",
"2004-03-26", "2004-03-29", "2004-03-30", "2004-03-31",
"2004-04-01", "2004-04-02", "2004-04-05", "2004-04-06",
"2004-04-07", "2004-04-08", "2004-04-12", "2004-04-13",
"2004-04-14", "2004-04-15", "2004-04-16", "2004-04-19",
"2004-04-20", "2004-04-21", "2004-04-22", "2004-04-23",
"2004-04-26", "2004-04-27", "2004-04-28", "2004-04-29",
"2004-04-30", "2004-05-03", "2004-05-04", "2004-05-05",
"2004-05-06", "2004-05-07", "2004-05-10", "2004-05-11",
"2004-05-12", "2004-05-13", "2004-05-14", "2004-05-17",
"2004-05-18", "2004-05-19", "2004-05-20", "2004-05-21",
"2004-05-24", "2004-05-25", "2004-05-26", "2004-05-27",
"2004-05-28", "2004-06-01", "2004-06-02", "2004-06-03",
"2004-06-04", "2004-06-07", "2004-06-08", "2004-06-09",
"2004-06-10", "2004-06-14", "2004-06-15", "2004-06-16",
"2004-06-17", "2004-06-18", "2004-06-21", "2004-06-22",
"2004-06-23", "2004-06-24", "2004-06-25", "2004-06-28",
"2004-06-29", "2004-06-30", "2004-07-01", "2004-07-02")
price5<- SMA(price,n=5)
price20<- SMA(price,n=20)
pricedf<- data.frame(date,price5,price20,price)
coldf <- ifelse(price5 - price20 > 0, 'green', 'red')
coldf[is.na(coldf)] <- 'green'
coldf
ggplot(pricedf) +
geom_line( aes(x = date, y=price, group = 1, color = coldf)) +
xlab("Date") +
ylab("Price")
Which creates this
graph,
I used an ifelse statement to find where price5 is greater then price 20. The problem is that this creates NA's which I filled with green. I am not 100% on if you which way you wanted it to be in terms of the green to the red. You can simply change the
coldf <- ifelse(price5 - price20 > 0, 'green', 'red')
to
coldf <- ifelse(price5 - price20 > 0, 'red', 'green')
Which looks like graph2.

Resources