How to cumulative sum by time 'bin' in R - with ggplot - r

I'm trying to create a cumulative graph as shown here, with another caveat. The steps should be based 2 minute time intervals, whereby an interval may have multiple or even no entries.
I used rowSums to create the column for the value to be used in cumsum,
e.g.,
df_so $intraverbal <- rowSums(df_so[-1] == "intraverbal")
df_so$tact <- rowSums(df_so[-1] == "tact")
df_so$mand <- rowSums(df_so[-1] == "mand")
df_so$echoic <- rowSums(df_so[-1] == "echoic")
The graph worked out well enough using plot:
plot(cumsum(df_so$intraverbal), type="s")
However, there are a couple ways it falls short. Ideally, the data would be tallied and labeled according to the "time bin". At the very least, the time bins should be on the x-label, but the increments aren't continuous. Hypothetically, I should be using dplyr or lapply to melt and combine them - but I'm not sure how. Perhaps, something as described here.
It would be nice to accomplish this with ggplot, so that the varying cumsums can be on the same graph, e.g., like here, or perhaps with stat_bin as here.
Here's a small working sample of the data:
df_so <- structure(list(time.bin = structure(c(1L, 1L, 1L, 1L, 1L, 1L,1L, 124L, 124L, 124L), .Label = c("0:00:00", "0:02:00", "0:04:00","0:06:00", "0:08:00", "0:10:00", "0:12:00", "0:14:00", "0:16:00","0:18:00",
"0:20:00", "0:22:00", "0:24:00", "0:26:00", "0:28:00","0:30:00", "0:32:00", "0:34:00", "0:36:00", "0:38:00", "0:40:00","0:42:00", "0:44:00", "0:46:00", "0:48:00", "0:50:00", "0:52:00","0:54:00", "0:56:00", "0:58:00",
"1:00:00", "1:02:00", "1:04:00","1:06:00", "1:08:00", "1:10:00", "1:12:00", "1:14:00", "1:16:00","1:18:00", "1:20:00", "1:22:00", "1:24:00", "1:26:00", "1:28:00","1:30:00", "1:32:00", "1:34:00", "1:36:00", "1:38:00",
"1:40:00","1:42:00", "1:44:00", "1:46:00", "1:48:00", "1:50:00", "1:52:00","1:54:00", "1:56:00", "1:58:00", "2:00:00", "2:02:00", "2:04:00","2:06:00", "2:08:00", "2:10:00", "2:12:00", "2:14:00", "2:16:00","2:18:00",
"2:20:00", "2:22:00", "2:24:00", "2:26:00", "2:28:00","2:30:00", "2:32:00", "2:34:00", "2:36:00", "2:38:00", "2:40:00","2:42:00", "2:44:00", "2:46:00", "2:48:00", "2:50:00", "2:52:00","2:54:00", "2:56:00", "2:58:00",
"3:00:00", "3:02:00", "3:04:00","3:06:00", "3:08:00", "3:10:00", "3:12:00", "3:14:00", "3:16:00","3:18:00", "3:20:00", "3:22:00", "3:24:00", "3:26:00", "3:28:00","3:30:00", "3:32:00", "3:34:00", "3:36:00", "3:38:00", "3:40:00","3:42:00", "3:44:00", "3:48:00", "3:50:00", "3:52:00", "3:54:00","3:56:00", "3:58:00", "4:00:00", "4:02:00", "4:04:00", "4:06:00","4:08:00"), class = "factor"),
Primary.VB = structure(c(1L,3L, 1L, 3L, 1L, 3L, 1L, 1L, 1L, 1L), .Label = c("", "echoic","intraverbal", "mand", "tact"), class = "factor"),
Secondary.VB = structure(c(1L,1L, 1L, 5L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "echoic","intraverbal", "mand", "tact"), class = "factor"),
Tertiary.VB = structure(c(1L,1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "intraverbal","mand", "tact"), class = "factor"), intraverbal = c(0, 1, 0,1, 0, 1, 0, 0, 0, 0),
tact = c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0),mand = c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0),
echoic = c(0, 0,0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("time.bin", "Primary.VB","Secondary.VB","Tertiary.VB","intraverbal",
"tact", "mand", "echoic"), row.names = c(1L, 2L,3L, 4L, 5L, 6L, 7L, 1648L, 1649L, 1650L), class = "data.frame")

Not an answer, just and extended comment that I'll delete. If we ignore for a second that the x axis represents the factor numbers...does it look alright?
tbl_df(df_so) %>%
group_by(time.bin) %>%
mutate(Csum=cumsum(intraverbal)) %>%
summarise(last=last(Csum)) %>%
mutate(tCsum=cumsum(last)) %>%
mutate(time.bin=as.numeric(time.bin)) %>%
ggplot(., aes(time.bin, tCsum))+
geom_step()

Related

How to order a plot from lowest to highest?

I looked at other threads where this question is answered,but i couldn't adapt the code. I plot the following graph below. I try then to order from lowest to highest according to the blue color (education==3) when time is at 0. I use the following code to create the order.
country_order <- df %>%
filter(education == 3 & time==0) %>%
arrange(unemployment) %>%
ungroup() %>%
mutate(order = row_number())
However, i am not sure how to introduce the new variable order into ggplot to get the ordering i want. Could someone help?
Here is the plot
ggplot(df, aes(y=unemployment, x=time, fill= education)) +
geom_col(, color = "black") +
facet_wrap(~ country)
Here is the data:
df= structure(list(time = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 2L,
2L, 2L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"),
unemployment = structure(c(25, 35, 40, 10, 20, 70, 20, 25,
55, 23, 17, 60), format.stata = "%9.0g"), education = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1",
"2", "3"), class = "factor"), country = structure(c(1, 1,
1, 1, 1, 1, 2, 2, 2, 2, 2, 2), format.stata = "%9.0g")), row.names = c(NA,
-12L), class = "data.frame")
I think you can use fct_reorder() to reorder factor levels of the desired variable by sorting along another variable.
df %>%
ggplot(aes(y=unemployment, x=time, fill= fct_reorder(education, unemployment, .desc = T))) +
geom_col(, color = "black") +
facet_wrap(~ country)

Why in PLM model.matrix gives a column with all 0s? How to solve?

I am running a individual-level fixed effect ols using the plm function. In the relevant model I regress an independent variable with 2 levels that varies between-subjects (between-subject treatment) and another independent variable with 2 levels that varies within-subjects (within subject treatment).
The summary of plm does not display the coefficient for the independent variable that varies within-subject. Inspecting model.matrix I noticed that the column of interest consists of all zeros.
Is there any way to solve the problem? Maybe resorting to a different type of contrast? Or by design is impossible to estimate the effect of a within-subject variable in fixed effect model like this?
Any help would be much appreciated.
#Reproducible example (unrelated with my actual dataset)
structure(list(DOILN = c(4.3207, 4.1675, 4.0718, 3.8239, 3.6247,
2.044, 1.3759, 1.4596, 1.486, 4.3136), ROSLN = c(-2.0178, -2.2647,
-4.0632, -3.9933, -3.441, -3.6077, -2.8291, -2.6271, -2.4051,
-1.7239), IRATE = c(-0.0295, -0.1228, 0.00288, 0.03388, -0.0295,
0.00288, 0.03849, 0.03388, 0.07165, 0.04809), GDPGROW = c(0.11731,
0.07891, 0.05072, 0.05745, 0.11731, 0.05072, 0.02142, 0.05745,
0.06645, -0.01765), ISOCode = structure(c(4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 3L), .Label = c("BRA", "CHN", "IND", "RUS"), class = "factor"),
ISOCodeBRA = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), ISOCodeRUS = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 0), ISOCodeIND = c(0, 0, 0, 0, 0,
0, 0, 0, 0, 1), ISOCodeCHN = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0)), .Names = c("DOILN", "ROSLN", "IRATE", "GDPGROW", "ISOCode",
"ISOCodeBRA", "ISOCodeRUS", "ISOCodeIND", "ISOCodeCHN"), row.names = c("120453-2010",
"120453-2011", "120453-2012", "120453-2014", "133431-2010", "133431-2012",
"133431-2013", "133431-2014", "133431-2015", "200448-2009"), class = c("pdata.frame",
"data.frame"), index = structure(list(GCKey = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L), .Label = c("120453", "133431",
"200448"), class = "factor"), FiscalY = structure(c(2L, 3L, 4L,
6L, 2L, 4L, 5L, 6L, 7L, 1L), .Label = c("2009", "2010", "2011",
"2012", "2013", "2014", "2015"), class = "factor")), .Names = c("GCKey",
"FiscalY"), row.names = c(915L, 647L, 35L, 41L, 83L, 68L, 220L,
330L, 497L, 1219L), class = c("pindex", "data.frame")))
mod <-plm(ROSLN ~ DOILN + GDPGROW + IRATE + factor(ISOCode),
data = dat, model = "within")
model.matrix(mod)
summary(mod)
I think the problem is that you're using a within model and there is no variation on ISOCode within GCKey - which is the index.
> table(index(dat)$GCKey, dat$ISOCode)
BRA CHN IND RUS
120453 0 0 0 4
133431 0 0 0 5
200448 0 0 1 0
So, applying the within transformation to the ISOCode dummy regressors produces a vector of all zeros. For example, if you used model='pooling', you would see a model matrix that was more like you expected.

Multiple vertical shaded area

I am plotting the proportion of deep sleep (y axis) vs days (x axis). I would like to add vertical shaded area for a better understanding (e.g. grey for week-ends, orange for sick period...).
I have tried using geom_ribbon (I created a variable taking the value of 30, with is the top of my y axis if the data is during the WE - information given in another column), but instead of getting rectangles, I get trapezes.
In another post, someone proposed the use of "geom_rect", or "annotate" if one's know the x and y coordinates, but I don't see how to adapt it in my case, when I want to have the colored area repeated to all week-end (it is not exactly every 7 days because some data are missing).
Do you have any idea ?
Many thanks in advance !
ggplot(Sleep.data, aes(x = DATEID)) +
geom_line(aes(y = P.DEEP, group = 1), col = "deepskyblue3") +
geom_point(aes(y = P.DEEP, group = 1, col = Sign.deep)) +
guides(col=FALSE) +
geom_ribbon(aes(ymin = min, ymax = max.WE), fill = '#6495ED80') +
facet_grid(MONTH~.) +
geom_hline(yintercept = 15, col = "forestgreen") +
geom_hline(yintercept = 20, col = "forestgreen", linetype = "dashed") +
geom_vline(xintercept = c(7,14,21,28), col = "grey") +
scale_x_continuous(breaks=seq(0,28,7)) +
scale_y_continuous(breaks=seq(0,30,5)) +
labs(x = "Days",y="Proportion of deep sleep stage", title = "Deep sleep")
Proportion of deep sleep vs time
Head(Sleep.data)
> dput(head(Sleep.data))
structure(list(DATE = structure(c(1L, 4L, 7L, 10L, 13L, 16L), .Label = c("01-Dec-17",
"01-Feb-18", "01-Jan-18", "02-Dec-17", "02-Feb-18", "02-Jan-18",
"03-Dec-17", "03-Feb-18", "03-Jan-18", "04-Dec-17", "04-Feb-18",
"04-Jan-18", "05-Dec-17", "05-Feb-18", "05-Jan-18", "06-Dec-17",
"06-Feb-18", "06-Jan-18", "07-Dec-17", "07-Feb-18", "07-Jan-18",
"08-Dec-17", "08-Jan-18", "09-Dec-17", "09-Feb-18", "09-Jan-18",
"10-Dec-17", "10-Jan-18", "11-Dec-17", "11-Feb-18", "11-Jan-18",
"12-Dec-17", "12-Jan-18", "13-Dec-17", "13-Feb-18", "13-Jan-18",
"14-Dec-17", "14-Feb-18", "14-Jan-18", "15-Dec-17", "15-Jan-18",
"16-Dec-17", "16-Jan-18", "17-Dec-17", "17-Jan-18", "18-Dec-17",
"18-Jan-18", "19-Dec-17", "19-Jan-18", "20-Dec-17", "21-Dec-17",
"21-Jan-18", "22-Dec-17", "22-Jan-18", "23-Dec-17", "23-Jan-18",
"24-Dec-17", "24-Jan-18", "25-Dec-17", "25-Jan-18", "26-Dec-17",
"26-Jan-18", "27-Dec-17", "27-Jan-18", "28-Dec-17", "28-Jan-18",
"29-Dec-17", "29-Jan-18", "30-Dec-17", "30-Jan-18", "31-Dec-17",
"31-Jan-18"), class = "factor"), DATEID = 1:6, MONTH = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("Decembre", "Janvier", "FĂ©vrier"
), class = "factor"), DURATION = c(8.08, 7.43, 6.85, 6.23, 7.27,
6.62), D.DEEP = c(1.67, 1.37, 1.62, 1.75, 1.95, 0.9), P.DEEP = c(17L,
17L, 21L, 24L, 25L, 12L), STIMS = c(0L, 0L, 0L, 0L, 390L, 147L
), D.REM = c(1.7, 0.95, 0.95, 1.43, 1.47, 0.72), P.REM = c(17L,
11L, 12L, 20L, 19L, 9L), D.LIGHT = c(4.7, 5.12, 4.27, 3.05, 3.83,
4.98), P.LIGHT = c(49L, 63L, 55L, 43L, 49L, 66L), D.AWAKE = c(1.45,
0.58, 0.47, 0.87, 0.37, 0.85), P.AWAKE = c(15L, 7L, 6L, 12L,
4L, 11L), WAKE.UP = c(-2L, 0L, 2L, -1L, 3L, 1L), AGITATION = c(-1L,
-3L, -1L, -2L, 2L, -1L), FRAGMENTATION = c(1L, -2L, 2L, 1L, 0L,
-1L), PERIOD = structure(c(3L, 3L, 4L, 4L, 4L, 4L), .Label = c("HOLIDAYS",
"SICK", "WE", "WORK"), class = "factor"), SPORT = structure(c(2L,
1L, 2L, 2L, 2L, 1L), .Label = c("", "Day", "Evening"), class = "factor"),
ACTIVITY = structure(c(6L, 1L, 3L, 4L, 5L, 1L), .Label = c("",
"Bkool", "eBike", "Gym", "Natation", "Run"), class = "factor"),
TABLETS = c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5), Ratio = c(1.15,
2.36, 3.45, 2.01, 5.27, 1.06), Sign = structure(c(2L, 2L,
2L, 2L, 2L, 2L), .Label = c("0", "1"), class = "factor"),
Sign.ratio = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("0",
"1"), class = "factor"), Sign.deep = structure(c(2L, 2L,
2L, 2L, 2L, 1L), .Label = c("0", "1"), class = "factor"),
Sign.awake = structure(c(1L, 2L, 2L, 1L, 2L, 1L), .Label = c("0",
"1"), class = "factor"), Sign.light = structure(c(2L, 1L,
1L, 2L, 2L, 1L), .Label = c("0", "1"), class = "factor"),
index = structure(c(1L, 1L, 1L, 1L, 2L, 1L), .Label = c("0",
"1"), class = "factor"), min = c(0, 0, 0, 0, 0, 0), max.WE = c(30,
30, 0, 0, 0, 0)), .Names = c("DATE", "DATEID", "MONTH", "DURATION",
"D.DEEP", "P.DEEP", "STIMS", "D.REM", "P.REM", "D.LIGHT", "P.LIGHT",
"D.AWAKE", "P.AWAKE", "WAKE.UP", "AGITATION", "FRAGMENTATION",
"PERIOD", "SPORT", "ACTIVITY", "TABLETS", "Ratio", "Sign", "Sign.ratio",
"Sign.deep", "Sign.awake", "Sign.light", "index", "min", "max.WE"
), row.names = c(NA, 6L), class = "data.frame")
Thanks for adding the data, that makes it easier to understand exactly what you're working with and to confirm that an answer actually addresses your question.
I thought it would be helpful to make a separate table with just the start and end of each contiguous set of rows with the same PERIOD. I did this using dplyr::case_when, assuming we should mark dates as a "start" if they are the first row in the table (row_number() == 1), or they have a different PERIOD value than the prior row. I mark dates as an "end" if they are the last row of the table, or have a different PERIOD than the next row. I only keep the starts and ends, and spread these into new columns called start and end.
library(tidyverse)
Period_ranges <- Sleep.data %>%
mutate(period_status = case_when(row_number() == 1 ~ "start",
PERIOD != lag(PERIOD) ~ "start",
row_number() == n() ~ "end",
PERIOD != lead(PERIOD) ~ "end",
TRUE ~ "other")) %>%
filter(period_status %in% c("start", "end")) %>%
select(DATEID, PERIOD, period_status) %>%
mutate(PERIOD_NUM = cumsum(PERIOD != lag(PERIOD) | row_number() == 1)) %>%
spread(period_status, DATEID)
# Output based on sample data only. If there's a problem with the full data, please add more. To share full data, use `dput(Sleep.data)` or to share 20 rows use `dput(head(Sleep.data, 20))`.
>Period_ranges
PERIOD PERIOD_NUM end start
1 WE 1 2 1
2 WORK 2 6 3
We can now use that in the plot. If you want to toggle the inclusion or fiddle with the appearance separately of different PERIOD types, you could modify the code below with Period_ranges %>% filter(PERIOD == "WE"),
ggplot(Sleep.data, aes(x = DATEID)) +
# Here I specify that this geom should use its own data.
# I start the rectangles half a day before and end half a day after to fill the space.
geom_rect(data = Period_ranges, inherit.aes = F,
aes(xmin = start - 0.5, xmax = end + 0.5,
ymin = 0, ymax = 30,
fill = PERIOD), alpha = 0.5) +
# Here we can specify the shading color for each type of PERIOD
scale_fill_manual(values = c(
"WE" = '#6495ED80',
"WORK" = "gray60"
)) +
# rest of your code
Chart based on data sample:

Remove box and points in legend

How do I remove the the box, ribbon color, and points in the legend? I would just like a straight line representing each color of the color. I've tried using guides(), but it's not changing.
Sample data:
pdat1 <- structure(list(type = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("10-year",
"20-year", "30-year"), class = "factor"), effect = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), value = c(0,
-21.89, -27.36, -33.75, -40.57, -47.32, 0, -23, -28.31, -34.96,
-42.6, -50.81, 0, -16.9, -22.25, -28.87, -36.4, -44.52, 0, -10.24,
-16.8, -24.74, -33.52, -42.55, 0, -10.24, -16.8, -24.74, -33.52,
-42.55, 0, -10.24, -16.8, -24.74, -33.52, -42.55), temp = c(0,
1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3,
4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5), value_max = c(2.91,
-19.02, -24.42, -30.88, -37.63, -44.35, 2.9, -20.09, -25.36,
-32.05, -39.67, -47.87, 2.97, -14.02, -19.27, -25.89, -33.49,
-41.58, 2.42, -7.74, -14.34, -22.27, -31.06, -40.02, 2.45, -7.8,
-14.36, -22.26, -31.07, -40.07, 2.46, -7.71, -14.23, -22.23,
-31.02, -40.05), value_min = c(-2.91, -24.76, -30.3, -36.63,
-43.5, -50.3, -2.9, -25.91, -31.27, -37.87, -45.52, -53.75, -2.97,
-19.77, -25.24, -31.85, -39.32, -47.46, -2.42, -12.74, -19.26,
-27.21, -35.98, -45.08, -2.45, -12.68, -19.24, -27.22, -35.96,
-45.02, -2.46, -12.77, -19.37, -27.25, -36.02, -45.05)), class = "data.frame", row.names = c(NA,
-36L), .Names = c("type", "effect", "value", "temp", "value_max",
"value_min"))
Plot Code
library(ggplot2)
ggplot(pdat1) +
geom_ribbon(aes(ymax = value_max, ymin = value_min, x = temp, linetype = NA, color = effect, fill = effect), fill = "#C0CCD9", alpha = 0.5 ) +
geom_line(aes(x = temp, y = value, color = effect, group = effect)) +
geom_point(aes(x = temp, y = value, color = effect), size = 0.5) +
ylab("Y") +
xlab("X") +
guides(color = guide_legend(keywidth = 2,
keyheight = 1,
override.aes = list(linetype = c(1, 1),
size = 1,
shape = c(0, 0)))) +
facet_wrap(~type)
Your ggplot code is a little bit messy, particularly for the ribbon. For example the fill aestetic is both mapped to the effect variable and set to a color value (#C0CCD9).
To remove the boxes in the legend key you need to use legend.key in theme but it works only after cleaning your ggplot code.
To avoid unnecessary repetitions I have moved severeal aestetics to the first ggplot call so that ggplot use them as default for the subsequent geom_XX calls.
ggplot(pdat1, aes(x = temp, y = value, group = effect)) +
geom_ribbon(aes(ymax = value_max, ymin = value_min), fill = "#C0CCD9", alpha = 0.5 ) +
geom_line(aes(color = effect)) +
geom_point(aes(color = effect), size = 0.5) +
ylab("Y") + xlab("X") +
guides(color = guide_legend(keywidth = 2, keyheight = 1,
override.aes = list(size = 1, shape = NA))) +
facet_wrap(~type) +
theme_bw() +
theme(legend.key = element_rect(fill = NA, color = NA))

Reproducible example and dput error

I'm trying to reproduce a data frame and dput is not cooperating.
dput command :
dput(head(data, 10))
dput output :
structure(list(lexptot = c(8.28377505197124, 9.1595012302023,
8.14707583238833, 9.86330744180814, 8.21391453619232, 8.92372556833205,
7.77219149815994, 8.58202430280175, 8.34096828565733, 10.1133857229336
), year = c(0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L), dfmfdyr = c(0,
1, 0, 1, 0, 1, 0, 1, 0, 1), dfmfd98 = c(1, 1, 1, 1, 1, 1, 1,
1, 1, 1), nh = c(11054L, 11054L, 11061L, 11061L, 11081L, 11081L,
11101L, 11101L, 12021L, 12021L)), .Names = c("lexptot", "year",
"dfmfdyr", "dfmfd98", "nh"), vars = list(nh), drop = TRUE, indices = list(
0:1, 2:3, 4:5, 6:7, 8:9), group_sizes = c(2L, 2L, 2L, 2L,
2L), biggest_group_size = 2L, labels = structure(list(nh = c(11054L,
11061L, 11081L, 11101L, 12021L)), class = "data.frame", row.names = c(NA,
-5L), .Names = "nh", vars = list(nh)), row.names = c(NA, 10L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
Error :
Error in structure(list(lexptot = c(8.28377505197124, 9.1595012302023, :
object 'nh' not found
Why is this happening right from a dput command?
Edit :
Relevant posts, but suggestions did not work.
Why does this dplyr dput not work?
Edit 2 :
It appears because one of my variables is a group object, dput cannot reproduce this. The solution is to use ungroup(data) then rerun dput and all works.
The issue was one of the variable objects was a group and therefore, dput() couldn't recognize this. The solution was to ungroup() the data.
ungroup(data)
dput(head(data, 10))
New Data.frame :
structure(list(lexptot = c(8.28377505197124, 9.1595012302023,
8.14707583238833, 9.86330744180814, 8.21391453619232, 8.92372556833205,
7.77219149815994, 8.58202430280175, 8.34096828565733, 10.1133857229336
), year = c(0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L), dfmfd98 = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1), dfmfd = c(0L, 1L, 0L, 1L, 1L, 1L,
1L, 1L, 1L, 1L)), .Names = c("lexptot", "year", "dfmfd98", "dfmfd"
), class = c("tbl_df", "data.frame"), row.names = c(NA, -10L))

Resources