GGplot Plotting Each Point Twice - r

I am trying to make an animated bubble chart for a baseball league I'm in. Once I create the animated graph and convert it into a gif, it plots each team twice, as shown in the picture below. The legend should only hold 14 points/teams, but it shows 28 instead.
My code is the following:
library(ggplot2)
library(gganimate)
library(readxl)
library(gifski)
library(png)
myData <- read_excel("~/Desktop/Dynasty - Fantasy Baseball.xlsx")
# Make a ggplot, but add frame=year: one image per year
g <- ggplot(myData, aes(PF, PA, size = `W%`, color = Team)) +
geom_point() +
theme_bw() +
# gganimate specific bits:
labs(title = 'Period: {frame_time-1900}', x = 'Points For', y = 'Points Against') +
transition_time(Year) +
ease_aes('linear')
# Save at gif:
anim_save(filename = "~/Desktop/FantasyBaseballAnimated.gif", animation = g)
My data is stored in the following:
structure(list(Team = c("Houston Astros", "Miami Marlins", "New York Mets",
"Atlanta Braves", "St. Louis Cardinals", "Cincinatti Reds", "Philadelphia Reds",
"Baltimore Orioles", "Milwaukee Brewers", "Washington Nationals",
"Montreal Expos", "Tampa Bay Rays", "Seattle Mariners", "Brooklyn Dodgers",
"Houston Astros", "Miami Marlins", "New York Mets", "Atlanta Braves",
"St. Louis Cardinals", "Cincinatti Reds", "Philadelphia Reds",
"Baltimore Orioles", "Milwaukee Brewers", "Washington Nationals",
"Montreal Expos", "Tampa Bay Rays", "Seattle Mariners", "Brooklyn Dodgers",
"New York Mets ", "St. Louis Cardinals ", "Cincinatti Reds ",
"Washington Nationals ", "Atlanta Braves ", "Miami Marlins ",
"Philadelphia Phillies ", "Tampa Bay Rays ", "Houston Astros ",
"Montreal Expos ", "Baltimore Orioles ", "Milwaukee Brewers ",
"Seattle Mariners ", "Brooklyn Dodgers ", "St. Louis Cardinals ",
"Washington Nationals ", "Miami Marlins ", "Cincinatti Reds ",
"New York Mets ", "Atlanta Braves ", "Tampa Bay Rays ", "Houston Astros ",
"Milwaukee Brewers ", "Philadelphia Phillies ", "Baltimore Orioles ",
"Montreal Expos ", "Seattle Mariners ", "Brooklyn Dodgers ",
"Washington Nationals ", "St. Louis Cardinals ", "Atlanta Braves ",
"Cincinatti Reds ", "New York Mets ", "Houston Astros ", "Miami Marlins ",
"Philadelphia Phillies ", "Tampa Bay Rays ", "Milwaukee Brewers ",
"Baltimore Orioles ", "Montreal Expos ", "Seattle Mariners ",
"Brooklyn Dodgers ", "St. Louis Cardinals ", "Washington Nationals ",
"Philadelphia Phillies ", "Miami Marlins ", "Atlanta Braves ",
"New York Mets ", "Houston Astros ", "Milwaukee Brewers ",
"Cincinatti Reds ", "Tampa Bay Rays ", "Montreal Expos ",
"Baltimore Orioles ", "Seattle Mariners ", "Brooklyn Dodgers ",
"New York Mets ", "St. Louis Cardinals ", "Washington Nationals ",
"Philadelphia Phillies ", "Miami Marlins ", "Houston Astros ",
"Atlanta Braves ", "Milwaukee Brewers ", "Cincinatti Reds ",
"Tampa Bay Rays ", "Montreal Expos ", "Baltimore Orioles ",
"Seattle Mariners ", "Brooklyn Dodgers ", "St. Louis Cardinals ",
"Washington Nationals ", "Houston Astros ", "New York Mets ",
"Philadelphia Phillies ", "Milwaukee Brewers ", "Atlanta Braves ",
"Miami Marlins ", "Cincinatti Reds ", "Tampa Bay Rays ", "Baltimore Orioles ",
"Montreal Expos ", "Seattle Mariners ", "Brooklyn Dodgers "
), W = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 9, 8,
7, 6, 6, 5, 6, 5, 4, 3, 2, 2, 2, 17, 17, 16, 14, 14, 14, 12,
11, 13, 7, 7, 6, 3, 3, 25, 24, 22, 21, 20, 20, 18, 19, 16, 14,
12, 9, 8, 5, 33, 32, 27, 27, 25, 26, 25, 23, 21, 21, 16, 15,
11, 7, 37, 37, 35, 34, 33, 32, 32, 29, 29, 27, 21, 19, 17, 7,
44, 43, 43, 40, 38, 40, 37, 37, 35, 32, 25, 23, 20, 7, 52, 50,
50, 48, 48, 43, 42, 40, 41, 38, 34, 28, 25, 8), L = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 3, 4, 6, 5, 6, 5, 6,
7, 8, 9, 10, 5, 5, 7, 7, 8, 9, 9, 9, 11, 14, 15, 15, 19, 21,
8, 9, 11, 13, 13, 13, 14, 16, 17, 19, 21, 22, 26, 31, 11, 12,
16, 19, 18, 19, 20, 22, 21, 22, 28, 28, 33, 40, 18, 18, 22, 22,
22, 22, 25, 25, 28, 27, 34, 36, 38, 52, 22, 22, 22, 28, 27, 29,
28, 28, 33, 31, 42, 42, 46, 64, 25, 27, 31, 30, 32, 33, 34, 37,
39, 37, 43, 51, 53, 75), T = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 2, 2, 2, 0, 2, 0, 2, 2, 2, 2, 1, 0, 2, 2, 1,
3, 2, 1, 3, 4, 0, 3, 2, 3, 2, 0, 3, 3, 3, 2, 3, 3, 4, 1, 3, 3,
3, 5, 2, 0, 4, 4, 5, 2, 5, 3, 3, 3, 6, 5, 4, 5, 4, 1, 5, 5, 3,
4, 5, 6, 3, 6, 3, 6, 5, 5, 5, 1, 6, 7, 7, 4, 7, 3, 7, 7, 4, 9,
5, 7, 6, 1, 7, 7, 3, 6, 4, 8, 8, 7, 4, 9, 7, 5, 6, 1), `W%` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.833, 0.792, 0.75, 0.667,
0.583, 0.5, 0.5, 0.5, 0.5, 0.417, 0.333, 0.25, 0.208, 0.167,
0.75, 0.75, 0.688, 0.646, 0.625, 0.604, 0.562, 0.542, 0.542,
0.354, 0.333, 0.312, 0.167, 0.125, 0.736, 0.708, 0.653, 0.611,
0.597, 0.597, 0.556, 0.542, 0.486, 0.431, 0.375, 0.319, 0.25,
0.139, 0.729, 0.708, 0.615, 0.583, 0.573, 0.573, 0.552, 0.51,
0.5, 0.49, 0.375, 0.365, 0.271, 0.156, 0.658, 0.658, 0.608, 0.6,
0.592, 0.583, 0.558, 0.533, 0.508, 0.5, 0.392, 0.358, 0.325,
0.125, 0.653, 0.646, 0.646, 0.583, 0.576, 0.576, 0.562, 0.562,
0.514, 0.507, 0.382, 0.368, 0.319, 0.104, 0.661, 0.637, 0.613,
0.607, 0.595, 0.56, 0.548, 0.518, 0.512, 0.506, 0.446, 0.363,
0.333, 0.101), `Div Rec` = c("0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0-0-0", "0-0-0", "37470",
"0-0-0", "0-0-0", "36683", "0-0-0", "36683", "0-0-0", "0-0-0",
"0-0-0", "37295", "0-0-0", "0-0-0", "17-5-2", "0-0-0", "36683",
"0-0-0", "36712", "36653", "0-0-0", "37295", "36594", "0-0-0",
"36683", "0-0-0", "0-0-0", "0-0-0", "37106", "36801", "36653",
"37207", "20-13-3", "13-10-1", "37512", "36594", "0-0-0", "36566",
"36683", "0-0-0", "36653", "0-0-0", "19-4-1", "37106", "13-10-1",
"37207", "25-18-5", "37541", "36754", "36843", "37512", "37381",
"36683", "0-0-0", "37482", "36931", "13-9-2", "19-4-1", "23-13-0",
"17-18-1", "13-10-1", "25-18-5", "37541", "37381", "13-21-2",
"15-19-2", "36683", "36683", "14-19-3", "36943", "25-18-5", "13-9-2",
"25-8-3", "28-19-1", "17-18-1", "18-16-2", "13-10-1", "13-8-3",
"19-26-3", "15-19-2", "36813", "37541", "17-27-4", "36943", "22-12-2",
"25-8-3", "18-16-2", "25-18-5", "28-19-1", "13-8-3", "13-10-1",
"17-18-1", "19-26-3", "15-19-2", "21-13-2", "13-23-0", "17-27-4",
"3-32-1"), GB = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0.5, 1, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7.5, 8, 0, 0, 1.5, 2.5, 3,
3.5, 4.5, 5, 5, 9.5, 10, 10.5, 14, 15, 0, 1, 3, 4.5, 5, 5, 6.5,
7, 9, 11, 13, 15, 17.5, 21.5, 0, 1, 5.5, 7, 7.5, 7.5, 8.5, 10.5,
11, 11.5, 17, 17.5, 22, 27.5, 0, 0, 3, 3.5, 4, 4.5, 6, 7.5, 9,
9.5, 16, 18, 20, 32, 0, 0.5, 0.5, 5, 5.5, 5.5, 6.5, 6.5, 10,
10.5, 19.5, 20.5, 24, 39.5, 0, 2, 4, 4.5, 5.5, 8.5, 9.5, 12,
12.5, 13, 18, 25, 27.5, 47), PF = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 10, 9.5, 9, 8, 7, 6, 6, 6, 6, 5, 4, 3, 2.5, 2,
18, 18, 16.5, 15.5, 15, 14.5, 13.5, 13, 13, 8.5, 8, 7.5, 4, 3,
26.5, 25.5, 23.5, 22, 21.5, 21.5, 20, 19.5, 17.5, 15.5, 13.5,
11.5, 9, 5, 35, 34, 29.5, 28, 27.5, 27.5, 26.5, 24.5, 24, 23.5,
18, 17.5, 13, 7.5, 39.5, 39.5, 36.5, 36, 35.5, 35, 33.5, 32,
30.5, 30, 23.5, 21.5, 19.5, 7.5, 47, 46.5, 46.5, 42, 41.5, 41.5,
40.5, 40.5, 37, 36.5, 27.5, 26.5, 23, 7.5, 55.5, 53.5, 51.5,
51, 50, 47, 46, 43.5, 43, 42.5, 37.5, 30.5, 28, 8.5), PA = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2.5, 3, 4, 5, 6, 6,
6, 6, 7, 8, 9, 9.5, 10, 6, 6, 7.5, 8.5, 9, 9.5, 10.5, 11, 11,
15.5, 16, 16.5, 20, 21, 9.5, 10.5, 12.5, 14, 14.5, 14.5, 16,
16.5, 18.5, 20.5, 22.5, 24.5, 27, 31, 13, 14, 18.5, 20, 20.5,
20.5, 21.5, 23.5, 24, 24.5, 30, 30.5, 35, 40.5, 20.5, 20.5, 23.5,
24, 24.5, 25, 26.5, 28, 29.5, 30, 36.5, 38.5, 40.5, 52.5, 25,
25.5, 25.5, 30, 30.5, 30.5, 31.5, 31.5, 35, 35.5, 44.5, 45.5,
49, 64.5, 28.5, 30.5, 32.5, 33, 34, 37, 38, 40.5, 41, 41.5, 46.5,
53.5, 56, 75.5), Period = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7), Place = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1, 2,
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), Year = c(1900,
1900, 1900, 1900, 1900, 1900, 1900, 1900, 1900, 1900, 1900, 1900,
1900, 1900, 1901, 1901, 1901, 1901, 1901, 1901, 1901, 1901, 1901,
1901, 1901, 1901, 1901, 1901, 1902, 1902, 1902, 1902, 1902, 1902,
1902, 1902, 1902, 1902, 1902, 1902, 1902, 1902, 1903, 1903, 1903,
1903, 1903, 1903, 1903, 1903, 1903, 1903, 1903, 1903, 1903, 1903,
1904, 1904, 1904, 1904, 1904, 1904, 1904, 1904, 1904, 1904, 1904,
1904, 1904, 1904, 1905, 1905, 1905, 1905, 1905, 1905, 1905, 1905,
1905, 1905, 1905, 1905, 1905, 1905, 1906, 1906, 1906, 1906, 1906,
1906, 1906, 1906, 1906, 1906, 1906, 1906, 1906, 1906, 1907, 1907,
1907, 1907, 1907, 1907, 1907, 1907, 1907, 1907, 1907, 1907, 1907,
1907)), row.names = c(NA, -112L), class = c("tbl_df", "tbl",
"data.frame"))
I thought factoring it would work, and also parsing it but neither worked:
#first thought
myData$Team <- factor(myData$Team)
summary(myData)
#second thought
myData$Team <- eval(parse(text = myData$Team))
Am I just missing something obvious? I'm drawing a blank at how I could fix this. Any help would be greatly appreciated!

It looks like you need to do some data cleaning:
data %>% group_by(Team) %>%
summarise(count = n())
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 28 x 2
Team count
<chr> <int>
1 "Atlanta Braves" 2
2 "Atlanta Braves " 6
3 "Baltimore Orioles" 2
4 "Baltimore Orioles " 6
5 "Brooklyn Dodgers" 2
6 "Brooklyn Dodgers " 6
7 "Cincinatti Reds" 2
8 "Cincinatti Reds " 6
9 "Houston Astros" 2
10 "Houston Astros " 6
# ... with 18 more rows
Using stringr:
data <- data %>%
mutate(Team = str_trim(Team, side = c("both")))

Answer
Remove the whitespace around the names:
myData$Team <- trimws(myData$Team)
Rationale
You actually have each team in there twice. Half just contain a single space at the end of their name. You may want to look into WHY that is happening.
table(myData$Team, myData$Year)[1:2, ]
# 1900 1901 1902 1903 1904 1905 1906 1907
# Atlanta Braves 1 1 0 0 0 0 0 0
# Atlanta Braves 0 0 1 1 1 1 1 1
sort(unique(myData$Team))[1:2]
#[1] "Atlanta Braves" "Atlanta Braves "

Related

When joining dfs with dplyr e.g. inner_join, keep matching cols from one df only

Example:
library(tidyverse)
mtcars1 <- mtcars %>% mutate(rn = row_number(), blah = rnorm(n(), 10, 1))
mtcars2 <- mtcars %>% mutate(rn = row_number(), blah2 = rnorm(n(), 5, 1))
mtcars_combined <- mtcars1 %>% inner_join(mtcars2, by = 'rn')
mtcars_combined %>% glimpse
Rows: 32
Columns: 25
$ mpg.x <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 32.4, 30.4, …
$ cyl.x <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8, 8, 8, 8, 4, 4, 4, 8, 6, 8, 4
$ disp.x <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 167.6, 167.6, 275.8, 275.8, 275.8, 472.0, 460.0, 44…
$ hp.x <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180, 205, 215, 230, 66, 52, 65, 97, 150, 150, 245, 1…
$ drat.x <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92, 3.07, 3.07, 3.07, 2.93, 3.00, 3.23, 4.08, 4.93, …
$ wt.x <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.440, 3.440, 4.070, 3.730, 3.780, 5.250, 5.424, 5.…
$ qsec.x <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18.30, 18.90, 17.40, 17.60, 18.00, 17.98, 17.82, 17…
$ vs.x <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1
$ am.x <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1
$ gear.x <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 3, 3, 4, 5, 5, 5, 5, 5, 4
$ carb.x <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2, 2, 4, 2, 1, 2, 2, 4, 6, 8, 2
$ rn <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,…
$ blah.x <dbl> 9.652697, 10.497945, 9.402642, 10.134072, 9.645391, 10.177435, 10.691140, 10.800154, 10.005802, 10.681475, 8.91997…
$ mpg.y <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 32.4, 30.4, …
$ cyl.y <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8, 8, 8, 8, 4, 4, 4, 8, 6, 8, 4
$ disp.y <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 167.6, 167.6, 275.8, 275.8, 275.8, 472.0, 460.0, 44…
$ hp.y <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180, 205, 215, 230, 66, 52, 65, 97, 150, 150, 245, 1…
$ drat.y <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92, 3.07, 3.07, 3.07, 2.93, 3.00, 3.23, 4.08, 4.93, …
$ wt.y <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.440, 3.440, 4.070, 3.730, 3.780, 5.250, 5.424, 5.…
$ qsec.y <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18.30, 18.90, 17.40, 17.60, 18.00, 17.98, 17.82, 17…
$ vs.y <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1
$ am.y <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1
$ gear.y <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 3, 3, 4, 5, 5, 5, 5, 5, 4
$ carb.y <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2, 2, 4, 2, 1, 2, 2, 4, 6, 8, 2
$ blah.y <dbl> 6.047953, 4.379261, 4.609405, 4.420695, 6.545795, 4.962723, 5.955824, 5.011969, 5.617293, 4.347312, 3.126674, 4.13…
I only joined on one field, rn. Because there are multiple matching field names, they are appended .x and .y. Of course, I could just have joined onto a smaller df with e.g.
mtcars_combined <- mtcars1 %>% inner_join(mtcars2 %>% select(rn, blah2), by = 'rn')
But, I'd like to know if there's a clever way to tell r to just keep matching fields from the left side and drop any duplicate fields coming from the right?
One approach is to make use of the suffix argument and drop the duplicated cols using select:
library(dplyr)
mtcars1 <- mtcars %>% mutate(rn = row_number(), blah = rnorm(n(), 10, 1))
mtcars2 <- mtcars %>% mutate(rn = row_number(), blah2 = rnorm(n(), 5, 1))
mtcars_combined <- mtcars1 %>% inner_join(mtcars2, by = 'rn', suffix = c("", "_drop"))
mtcars_combined <- select(mtcars_combined, -ends_with("_drop"))
glimpse(mtcars_combined)
#> Rows: 32
#> Columns: 14
#> $ mpg <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 1...
#> $ cyl <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4...
#> $ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8...
#> $ hp <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180,...
#> $ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3...
#> $ wt <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150...
#> $ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90...
#> $ vs <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1...
#> $ am <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0...
#> $ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3...
#> $ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1...
#> $ rn <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18...
#> $ blah <dbl> 10.856380, 9.634127, 10.280296, 10.153320, 9.255293, 10.38564...
#> $ blah2 <dbl> 5.724742, 5.740158, 4.743665, 5.337721, 4.239426, 5.989236, 4...

Having trouble using tidyr pivot_wider to spread data

I have a dataset comparing 15 hybrids, each with 5 separate measurements. I am trying to spread the data into a wider dataset using pivot_wider for a regression analysis, since spread() would not work (probably because of the repeated observations).
The dataset I am working with is below:
data <- structure(list(hybrid = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11,
11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12,
12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 13,
13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14,
14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
15, 15, 15), measurement = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4,
4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1,
1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3,
3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5,
5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2, 2,
2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4,
4, 5, 5, 5, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 1, 1, 1, 2, 2, 2, 3,
3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5,
5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2,
2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4,
4, 4, 5, 5, 5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5), value = c(245,
889, 450, 45, 515, 318, 956, 434, 29, 740, 156, 516, 767, 292,
753, 573, 636, 611, 777, 557, 408, 95, 482, 227, 495, 360, 55,
76, 393, 37, 667, 802, 724, 900, 885, 191, 79, 143, 531, 398,
324, 129, 172, 467, 25, 101, 476, 629, 915, 122, 498, 649, 354,
527, 920, 788, 565, 552, 586, 127, 461, 307, 77, 552, 198, 240,
816, 144, 136, 781, 593, 421, 233, 264, 812, 407, 492, 932, 940,
139, 764, 200, 352, 754, 271, 506, 381, 973, 678, 848, 432, 358,
218, 736, 287, 411, 220, 264, 531, 669, 666, 727, 841, 792, 79,
460, 159, 426, 90, 395, 793, 507, 262, 814, 157, 641, 230, 870,
304, 591, 636, 277, 534, 783, 562, 938, 889, 68, 557, 892, 809,
157, 71, 54, 256, 246, 301, 823, 622, 953, 6, 66, 556, 902, 207,
832, 248, 540, 192, 65, 381, 712, 15, 323, 1, 193, 146, 637,
488, 158, 289, 839, 229, 237, 273, 978, 560, 969, 898, 204, 335,
930, 444, 968, 920, 398, 303, 318, 975, 182, 630, 4, 624, 271,
272, 438, 661, 728, 32, 106, 473, 465, 498, 33, 189, 918, 704,
605, 867, 240, 833, 497, 514, 241, 860, 228, 643, 791, 4, 898,
574, 225, 339, 365, 387, 548, 88, 604, 283)), class = "data.frame", row.names = c(NA,
-219L))
I'm new to the pivot_wider function, so when I run my code, I get an error:
data%>%
pivot_wider(cols = -hybrid, names_to = c("1","2","3","4","5"))
Error in pivot_wider(., cols = -hybrid, names_to = c("1", "2", "3", "4", :
unused arguments (cols = -hybrid, names_to = c("1", "2", "3", "4", "5"))
How can I spread this data so that I have 5 columns? Hybrid, 1, 2, 3, 4, 5 (with the values under the columns entitled 1:5).
My guess is that you are you looking for this:
library(tidyr)
pivot_wider(data, id_cols = hybrid, names_from = measurement, values_from = "value", values_fn = sum)
# # A tibble: 15 x 6
# hybrid `1` `2` `3` `4` `5`
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 1584 878 1419 1412 1812
# 2 2 1820 1742 804 910 506
# 3 3 2193 1976 753 851 664
# 4 4 1206 1535 1530 2273 1265
# 5 5 845 990 1096 1795 1309
# 6 6 1831 1843 1306 1158 2499
# 7 7 1008 1434 1015 2062 1712
# 8 8 1045 1278 1583 1028 1765
# 9 9 913 1317 1500 957 1449
# 10 10 1037 556 1746 1025 1665
# 11 11 1620 638 1050 340 1283
# 12 12 1357 1488 2427 1469 2332
# 13 13 1019 1787 899 1371 866
# 14 14 1436 1140 2176 1570 1615
# 15 15 1662 1476 929 1023 887
Using dcast from data.table
library(data.table)
dcast(setDT(data), hybrid ~ measurement, sum)
# hybrid 1 2 3 4 5
# 1: 1 1584 878 1419 1412 1812
# 2: 2 1820 1742 804 910 506
# 3: 3 2193 1976 753 851 664
# 4: 4 1206 1535 1530 2273 1265
# 5: 5 845 990 1096 1795 1309
# 6: 6 1831 1843 1306 1158 2499
# 7: 7 1008 1434 1015 2062 1712
# 8: 8 1045 1278 1583 1028 1765
# 9: 9 913 1317 1500 957 1449
#10: 10 1037 556 1746 1025 1665
#11: 11 1620 638 1050 340 1283
#12: 12 1357 1488 2427 1469 2332
#13: 13 1019 1787 899 1371 866
#14: 14 1436 1140 2176 1570 1615
#15: 15 1662 1476 929 1023 887

Calculate row similarity percentage pair wise and add it as a new column

I have a date frame like this sample, I would like to find similar rows (not duplicate) and calculate similarity per wise. I find this solution but i would like to keep all my columns and add similarity percentage as a new variable. My aim is to find records with highest similarity percentage. How could I do it ?
sample data set
df <- tibble::tribble(
~date, ~user_id, ~Station_id, ~location_id, ~ind_id, ~start_hour, ~start_minute, ~start_second, ~end_hour, ~end_minute, ~end_second, ~duration_min,
20191015, 19900234, 242, 2, "ac", 7, 25, 0, 7, 30, 59, 6,
20191015, 19900234, 242, 2, "ac", 7, 31, 0, 7, 32, 59, 2,
20191015, 19900234, 242, 2, "ac", 7, 33, 0, 7, 38, 59, 6,
20191015, 19900234, 242, 2, "ac", 7, 39, 0, 7, 40, 59, 2,
20191015, 19900234, 242, 2, "ac", 7, 41, 0, 7, 43, 59, 3,
20191015, 19900234, 242, 2, "ac", 7, 44, 0, 7, 45, 59, 2,
20191015, 19900234, 242, 2, "ac", 7, 47, 0, 7, 59, 59, 13,
20191015, 19900234, 242, 2, "ad", 7, 47, 0, 7, 59, 59, 13,
20191015, 19900234, 242, 2, "ac", 8, 5, 0, 8, 6, 59, 2,
20191015, 19900234, 242, 2, "ad", 8, 5, 0, 8, 6, 59, 2,
20191015, 19900234, 242, 2, "ac", 8, 7, 0, 8, 8, 59, 2,
20191015, 19900234, 242, 2, "ad", 8, 7, 0, 8, 8, 59, 2,
20191015, 19900234, 242, 2, "ac", 16, 26, 0, 16, 55, 59, 30,
20191015, 19900234, 242, 2, "ad", 16, 26, 0, 16, 55, 59, 30,
20191015, 19900234, 242, 2, "ad", 17, 5, 0, 17, 6, 59, 2,
20191015, 19900234, 242, 2, "ac", 17, 5, 0, 17, 23, 59, 19,
20191015, 19900234, 242, 2, "ad", 17, 7, 0, 17, 15, 59, 9,
20191015, 19900234, 242, 2, "ad", 17, 16, 0, 17, 22, 59, 7,
20191015, 19900234, 264, 2, "ac", 17, 24, 0, 17, 35, 59, 12,
20191015, 19900234, 264, 2, "ad", 17, 25, 0, 17, 35, 59, 11,
20191016, 19900234, 242, 1, "ac", 7, 12, 0, 7, 14, 59, 3,
20191016, 19900234, 242, 1, "ad", 7, 13, 0, 7, 13, 59, 1,
20191016, 19900234, 242, 1, "ac", 17, 45, 0, 17, 49, 59, 5,
20191016, 19900234, 242, 1, "ad", 17, 46, 0, 17, 48, 59, 3,
20191016, 19900234, 242, 2, "ad", 7, 14, 0, 8, 0, 59, 47,
20191016, 19900234, 242, 2, "ac", 7, 15, 0, 8, 0, 59, 47
)
Function for comparing rows
row_cf <- function(x, y, df){
sum(df[x,] == df[y,])/ncol(df)
}
Function output
# 1) Create all possible row combinations
# 2) Rename
# 3) Run through each row
# 4) Calculate similarity
expand.grid(1:nrow(df), 1:nrow(df)) %>%
rename(row_1 = Var1, row_2 = Var2) %>%
rowwise() %>%
mutate(similarity = row_cf(row_1, row_2, df))
# A tibble: 676 x 3
row_1 row_2 similarity
<int> <int> <dbl>
1 1 1 1
2 2 1 0.75
3 3 1 0.833
4 4 1 0.75
5 5 1 0.75
6 6 1 0.75
7 7 1 0.75
8 8 1 0.667
9 9 1 0.583
10 10 1 0.5
Edit:
I would like to find similar rows in the data like here
Using your "function output", call it sim. Eliminate the self-comparisons and then keep the max similarity row grouped by row_1:
sim = sim %>%
filter(row_1 != row_2) %>%
group_by(row_1) %>%
slice(which.max(similarity))
Then you can add these to your original data:
df %>% mutate(row_1 = 1:n()) %>%
left_join(sim)
The row_2 column gives the row number of the most similar row, and similarity gives its similarity score. (You may want to improve these column names.)

How to set to 0 all values that appeares less than k times in variables within nested df

library(tidyverse)
ex <- structure(list(group = c("Group A", "Group B", "Group C"), data = list(
structure(list(a = c(25.1, 15.1, 28.7, 29.7, 5.3, 3.4, 5.3,
10.1, 2.4, 18, 4.7, 22.1, 9.5, 3.1, 26.5, 5.1, 24, 22.5,
19.4, 22.9, 24.5, 18.2, 7.9, 5.3, 24.7), b = c(95.1, 51,
100, 94.1, 47.3, 0, 50.7, 45.8, 40.7, 49.4, 51.9, 76.4, 26.7,
19.8, 37.4, 59.4, 59.1, 60.2, 26.1, 2.8, 100, 40.7, 56.4,
42.5, 0), c = c(39.9, 42.7, 16.3, 11.1, 56.9, 17.8, 62, 28.1,
43, 44.8, 54.8, 8.7, 5.5, 40.2, 7.7, 60.7, 24.8, 7.5, 3.5,
16.9, 31.6, 45.8, 76.7, 58.6, 15.8), d = c(-2.39999999999999,
28.6, -4.59999999999999, -1.39999999999999, 10.3, 3.1, 23.4,
-43, -36.3, 32.4, 33.1, 9.8, 1.5, -17.6, 16.6, 20.9, 7.8,
-1.7, -23.3, 0, -15, 59.3, -40.2, 46.9, 4.7)), .Names = c("a",
"b", "c", "d"), row.names = c(NA, -25L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(a = c(5, 4.7, 30.3,
14.3, 31.6, 6, 4.9, 23.3, 26.9, 16.9, 27.2, 23.8, 19.9, 28.6,
9.9, 17.4, 14.3, 12.5, 30.4, 30.3, 30, 6, 18, 23.7, 5.1),
b = c(48.9, 41.3, 20.1, 63.7, 85.1, 30.3, 52.8, 49.7,
27.1, 51.6, 21.8, 52.4, 52.5, 59.6, 13.7, 53.1, 69, 66.9,
23.4, 35.4, 45.8, 23.7, 62.9, 90.3, 59.6), c = c(37.4,
18.5, 64.6, 13.5, 7.8, 6.8, 12.7, 8.5, 7.8, 5.4, 14.1,
20.5, 10.9, 10.5, 7.5, 14.7, 6.9, 0.699999999999999,
4.7, 1.9, 11.9, 0.9, 7.2, 9.2, 42.2), d = c(4.9, -3.7,
13.5, 21.9, -2.69999999999999, 6.6, 0.5, -12.3, 38.7,
-25.8, -18, 28.4, 38.3, -3.6, 39.4, 19, 23.4, -38.7,
17, 36.3, -31.7, -9.3, -10.5, 9.7, -10.6)), .Names = c("a",
"b", "c", "d"), row.names = c(NA, -25L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(a = c(29.9, 12.8, 23.9,
26.2, 27.5, 32.6, 33.2, 24.8, 29, 22.6, 4.7, 25.6, 4.7, 13.1,
25.9, 14.5, 23.5, 26.6, 12.8, 24.1, 9.1, 31.9, 24.8, 4.6,
17.9), b = c(63.7, 23.3, 71.2, 46.7, 30.6, 49.3, 14.6, 68.4,
27.9, 49.1, 60.5, 26.4, 56.9, 55.4, 37.9, 40.7, 32.7, 68.5,
42.7, 27.9, 67.5, 43.4, 76.6, 53.3, 26.8), c = c(1.6, 32,
18.6, 14, 0.5, 7.2, 27.3, 8.9, 11, 15.5, 16.7, 16.4, 63.1,
14.7, 6.8, 9, 3.1, 11.7, 11, 11.5, 10.6, 14.9, 7.1, 13.2,
5.1), d = c(-35.4, 21, 12, 1.8, 37.6, 9.2, 17.6, 0, -19.4,
32.6, -32, -3.6, 7.2, -25.7, 9.1, -8, 35.8, 24.8, -13.9,
-21.7, -28.7, 0.200000000000003, -16.9, -26.5, 26.2)), .Names = c("a",
"b", "c", "d"), row.names = c(NA, -25L), class = c("tbl_df",
"tbl", "data.frame"))), h_candidates = list(structure(c(0.17320508075689, 2.37782856461527, 2.94890646051978, 3.35205778704499, 3.66771041547043, 3.95224618679369), .Names = c("0%", "0.01%", "0.02%", "0.03%", "0.04%", "0.05%")), structure(c(0.316227766016836, 2.63452963884554, 3.2327619513522, 3.63593179253957, 3.97743636027027, 4.22137418384109), .Names = c("0%", "0.01%", "0.02%", "0.03%", "0.04%", "0.05%")), structure(c(0.316227766016837, 2.7258026340878, 3.24807635378234, 3.62353418639869, 3.92683078321437, 4.17731971484109), .Names = c("0%", "0.01%", "0.02%", "0.03%", "0.04%", "0.05%"))), assignment = list(
structure(list(`0%` = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25),
`0.01%` = c(1, 2, 3, 3, 4, 5, 4, 6, 7, 8, 9, 10, 11,
12, 13, 4, 14, 15, 16, 17, 18, 19, 20, 21, 17), `0.02%` = c(1,
2, 3, 3, 4, 5, 4, 6, 7, 8, 9, 10, 11, 12, 13, 4, 14,
15, 16, 17, 18, 19, 20, 21, 17), `0.03%` = c(1, 2, 3,
3, 4, 5, 4, 6, 7, 8, 9, 10, 11, 12, 13, 4, 10, 14, 15,
16, 17, 18, 19, 9, 16), `0.04%` = c(1, 2, 3, 4, 5, 6,
5, 7, 8, 9, 10, 11, 12, 13, 14, 5, 11, 15, 16, 17, 18,
19, 20, 10, 17)), .Names = c("0%", "0.01%", "0.02%",
"0.03%", "0.04%"), row.names = c(NA, -25L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(`0%` = c(1, 2, 3, 4,
5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25), `0.01%` = c(1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 4, 17, 18, 19, 20, 21, 22,
23, 24), `0.02%` = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 13, 4, 16, 17, 9, 18, 19, 14, 20, 21), `0.03%` = c(1,
2, 3, 4, 5, 6, 2, 7, 8, 9, 10, 11, 12, 13, 14, 12, 4, 15,
6, 8, 16, 17, 13, 18, 19), `0.04%` = c(1, 2, 3, 4, 5, 6,
2, 7, 8, 9, 10, 11, 12, 13, 14, 12, 4, 15, 6, 8, 7, 16, 13,
17, 1)), .Names = c("0%", "0.01%", "0.02%", "0.03%", "0.04%"
), row.names = c(NA, -25L), class = c("tbl_df", "tbl", "data.frame"
)), structure(list(`0%` = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25
), `0.01%` = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14, 12, 15, 16, 17, 15, 18, 19, 4, 20, 21, 22), `0.02%` = c(1,
2, 3, 4, 5, 6, 7, 8, 9, 5, 10, 11, 12, 13, 11, 14, 5, 15,
14, 16, 17, 18, 8, 19, 20), `0.03%` = c(1, 2, 3, 4, 5, 6,
7, 3, 8, 9, 10, 11, 12, 10, 11, 13, 5, 14, 13, 8, 10, 4,
3, 13, 6), `0.04%` = c(1, 2, 3, 4, 5, 5, 6, 3, 7, 8, 9, 10,
11, 9, 10, 12, 5, 13, 12, 7, 9, 4, 3, 12, 5)), .Names = c("0%",
"0.01%", "0.02%", "0.03%", "0.04%"), row.names = c(NA, -25L
), class = c("tbl_df", "tbl", "data.frame")))), .Names = c("group", "data", "h_candidates", "assignment"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L))
With the data structured like above I would like to change all values within assignment data.frames that appears less than k times (let's say k = 5) in a column.
So I need a solution that takes subsequent data.frames, then subsequent columns within a data.frame, check which values appears less than 5 times in a column and if there are any just replace them with 0.
At best, the solution would involve tidyverse functions. I think that nested purrr::map, as well as dplyr::mutate are needed here, but don't know how to count appearances within a column and replace the values then.
You can use purrr::map() to loop over the list column with the dataframes,
and then purrr::modify() to loop over each column in each dataframe. Then
it's just a matter of defining a function that counts occurences of values in
a vector, and replaces them if the count is less than k:
library(tidyverse)
ex %>%
mutate(assignment = map(assignment, modify, function(x, k) {
n <- table(x)[as.character(x)]
replace(x, n < k, 0)
}, k = 5))
#> # A tibble: 3 x 4
#> group data h_candidates assignment
#> <chr> <list> <list> <list>
#> 1 Group A <tibble [25 x 4]> <dbl [6]> <tibble [25 x 5]>
#> 2 Group B <tibble [25 x 4]> <dbl [6]> <tibble [25 x 5]>
#> 3 Group C <tibble [25 x 4]> <dbl [6]> <tibble [25 x 5]>
We can also define a couple of helper functions to make this more readable:
# Replace elements in x given by f(x) with val
replace_if <- function(x, f, val, ...) {
replace(x, f(x, ...), val)
}
appears_less_than <- function(x, k) {
table(x)[as.character(x)] < k
}
Combining these two functions gets what we are after:
replace_if(c(1, 1, 2, 3), appears_less_than, k = 2, 0)
#> [1] 1 1 0 0
Now all that remains is to put the pieces together:
res <- ex %>%
mutate(assignment = map(assignment, modify, replace_if,
appears_less_than, k = 3, 0))
As #thothal mentioned, there aren't any values in your data that occur more
than 4 times in your data, but with k = 3 we can have a look at the result
(to illustrate, just the 3rd dataframe in assignment):
res %>% pluck("assignment", 3)
#> # A tibble: 25 x 5
#> `0%` `0.01%` `0.02%` `0.03%` `0.04%`
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 0 0 0 0
#> 2 0 0 0 0 0
#> 3 0 0 0 3 3
#> 4 0 0 0 0 0
#> 5 0 0 5 0 5
#> 6 0 0 0 0 5
#> 7 0 0 0 0 0
#> 8 0 0 0 3 3
#> 9 0 0 0 0 0
#> 10 0 0 5 0 0
#> # ... with 15 more rows
Finally, we could also use a scoped mutate_at() to further reduce some of
the excess syntax:
ex %>%
mutate_at(vars(assignment), map, modify,
replace_if, appears_less_than, k = 3, 0)
Created on 2018-08-08 by the reprex package (v0.2.0.9000).
This should do the trick:
library(tidyverse)
ex %>%
mutate(
assignment = map(assignment,
~ rowid_to_column(.x, "id") %>%
gather(key, value, -id) %>%
group_by(key) %>%
add_count(value) %>%
mutate(value = ifelse(n < 5, 0, n)) %>%
select(-n) %>%
spread(key, value) %>%
select(-id)
)
)
Note in your example there is no single value appearing more than 4 times.
Explanation
You map over all assignment data.frames
For each data.frame you first add an id column (needed for gather/spread)
Then you gather all columns butidinto akey(former column names)value` (the values) pair
For each group of former columns (now in key) you add a counter of the values in value
Then you replace occurrences which appear less than 5 times by 0
You remove n (the counter)
spread the data back into the original format
Remove the id column

sem.mi or runMI

I am running a path analysis in lavaan (with ordinal) and would like to use imputed data.
But whether I impute data separately and use runMI or let the original data be imputed as a part of sem.mi command, I get same error:
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)?
If I run:
options(expressions = 100000)
the error message changes to: Error: protect(): protection stack overflow
I tried to change
--max-ppsize=500000
but in the command line I can't access rstudio.exe (says: the system cannot find the path specified, - even though I double-checked the path:
C:\Program Files\RStudio\bin\rstudio.exe --max-ppsize=500000)
What can I do to run my analysis with imputed data or to impute it as a part of the path analyses estimation?
Here is my code:
imp <- mice(dat2,m=5,print=F)
imputedData <- NULL
for(i in 1:5) {
imputedData[[i]] <- complete(x=imp, action=i, include=FALSE)
}
model5 <- 'ceadiff ~ mompa + cdpea + momabhx
mompa ~ b1*peadiff + c*momabhx + cdpea + b2*mommhpsi
peadiff ~ a1*momabhx + mommhpsi
cdpea ~ momabhx + mommhpsi
mommhpsi ~ a2*momabhx
peadiff ~~ cdpea
direct := c
indirect1 := a1 * b1
indirect1 := a2 * b2
total := c + (a1 * b1) + (a2 * b2)'
fit5 <- runMI(model5, data = imputedData, fun="sem", ordered = "mompa")
summary(fit5, standardized = TRUE, fit = TRUE, ci = T)
# or:
fit5 <- sem.mi(model5, data = dat2, m=5, ordered = "mompa")
summary(fit5, standardized = TRUE, fit = TRUE, ci = T)
P.S. It prints summary with a warning in this scenario but doesn't print p-values or CIs, so I cannot determine what coefficients are sig.:
fit5 <- sem.mi(model5, data = dat2, m=5, ordered = "mompa")
summary(fit5)
** WARNING ** lavaan (0.5-23.1097) model has NOT been fitted
** WARNING ** Estimates below are simply the starting values
Thank you!
P.S. I don't know how to supply my data sample.
Here is the unimputed data output:
> dput(dat2)
structure(list(id = structure(c(145, 253, 189, 305, 149, 567,
151, 853, 272, 67, 111, 695, 1695, 1301, 2322, 1335, 1490, 580,
209, 1109, 1317, 812, 1459, 2150, 685, 1583, 839, 2156, 1627,
1103, 649, 2294, 1712, 1711, 793, 1425, 1114, 146, 1529, 985,
1889, 1974, 444, 1664, 1569, 859, 1947, 1219, 1427, 1533, 2143,
769, 256, 147, 1393, 1847, 1967, 1651, 1084, 1343, 996, 1765,
1596, 2157, 978, 1448, 915, 1411, 1412, 675, 1876, 53, 400, 2103,
1028, 663, 1090, 360, 2134, 1937, 1061, 1823, 935, 891, 1968,
34, 487, 207, 295, 1118, 1164, 1053, 1511, 777, 1760, 38, 480,
459, 307, 1962, 199, 499, 1375, 782, 1855, 1624, 109, 1481, 483,
536, 972, 1151, 19, 403, 543, 502, 2251, 254, 429, 2118, 1272,
1995, 982, 1748, 1641, 1994, 1718, 510, 494, 273, 602, 549, 293,
1796, 1497, 1197, 1874, 1179, 159, 205, 242, 299, 100, 1200,
579, 870, 1482, 2131, 33, 1319, 148, 1297, 626, 1051, 1948, 1057,
1581, 1349, 1284, 1178, 1178, 1044, 1001, 547, 276, 507, 871,
698, 1006, 1946, 2101, 68, 265, 1186, 1895, 1864, 1884, 1553,
1761, 2171, 168, 30, 1132, 1983, 1897, 1383, 1353, 1697, 1752,
505, 1605, 1144, 1358, 1052, 1645, 1346, 14, 439, 2154, 932,
971, 2104, 1345, 1821, 52, 1642, 1661, 1835, 1232, 2132, 809,
606, 54, 528, 59, 1848, 232, 1750, 2340, 882, 716, 2105, 711,
2109, 2353, 41, 2144, 552, 304, 2404, 1527, 1980, 927, 1586,
1805, 1982, 1181, 2163, 861, 198, 1404, 986, 1404, 238, 2115,
1125), format.spss = "F4.0", display_width = 11L), peadiff = structure(c(4,
7, 2, 2, 3, 4, 5, 5, 2, 6, 2, 6, 4, 3, 4, 5, 2, 3, 2, 1, 1, 3,
3, 3, 3, 5, 6, 3, 2, 2, 2, 4, 2, 2, 3, 5, 2, 4, 6, 2, 2, 3, 2,
1, 7, 7, 2, 5, 6, 4, 4, 4, 2, 9, 3, 4, 6, 7, 3, 3, 4, 3, 7, 5,
7, 4, 1, 1, 6, 14, 6, 2, 4, 3, 6, 4, 6, 7, 8, 5, 3, 4, 5, 1,
5, 4, 4, 9, 6, 3, 4, 3, 6, 6, 3, 1, 2, 2, 5, 4, 4, 1, 1, 3, 3,
3, 3, 7, 5, 4, 3, 4, 3, 4, 3, 4, 4, 4, 6, 3, 1, 1, 6, 4, 6, 9,
2, 3, 3, 7, 4, 1, 2, 9, 2, 3, 6, 1, 5, 3, 8, 4, 0, 4, 4, 6, 2,
4, 2, 7, 6, 8, 5, 3, 10, 3, 1, 4, 6, 6, 6, 5, 4, 5, 3, 7, 3,
4, 8, 4, 7, 4, 15, 4, 0, 2, 5, 3, 3, 3, 5, 7, 4, 7, 5, 2, 3,
2, 8, 5, 2, 5, 4, 5, 2, 4, 3, 3, 5, 4, 4, 3, 5, 2, 4, 3, 2, 1,
6, 2, 8, 2, 6, 3, 0, NA, 6, 3, 4, 2, 9, 3, 4, 4, 2, 12, 5, 4,
0, 2, 2, 5, 2, 1, 3, 3, 4, 3, 2, 4, 7, 9, 5, 4, 6, 8), format.spss = "F8.2", display_width = 10L),
ceadiff = structure(c(5, 4, 2, 1, 2, 2, 3, 4, 3, 4, 0, 2,
2, 1, 4, 2, 6, 4, 2, 2, 2, 3, 4, 2, 6, 4, 4, 4, 5, 3, 2,
4, 4, 3, 1, 7, 3, 6, 8, 2, 3, 2, 2, 1, 4, 5, 0, 4, 2, 3,
4, 4, 1, 5, 3, 1, 4, 3, 5, 2, 0, 4, 0, 5, 4, 2, 4, 3, 2,
7, 7, 0, 5, 0, 4, 5, 2, 4, 4, 3, 2, 4, 2, 2, 3, 4, 4, 3,
1, 3, 4, 6, 8, 2, 2, 5, 2, 6, 6, 2, 4, 0, 2, 4, 2, 2, 2,
5, 2, 2, 7, 6, 3, 6, 4, 8, 2, 2, 5, 1, 1, 1, 2, 1, 3, 3,
4, 3, 5, 8, 2, 1, 4, 3, 1, 3, 5, 5, 2, 4, 4, 5, 1, 1, 8,
6, 1, 4, 12, 5, 7, 8, 3, 6, 5, 6, 3, 5, 4, 3, 3, 4, 6, 4,
2, 6, 2, 3, 4, 2, 7, 4, 7, 4, 3, 0, 3, 0, 2, 2, 1, 3, 5,
1, 4, 2, 1, 2, 7, 4, 4, 4, 8, 6, 2, 6, 1, 1, 5, 3, 0, 5,
8, 4, 8, 3, 0, 3, 4, 5, 5, 2, 6, 0, 6, NA, 4, 4, 1, 3, 12,
2, 0, 4, 0, 5, 4, 3, 2, 1, 1, 5, 5, 6, 3, 1, 2, 1, 4, 2,
8, 6, 3, 0, 1, 3), format.spss = "F8.2", display_width = 10L),
cdpea = structure(c(22, 18, 17, 13, 19, 20, 19, 20, 17, 17,
17, 14, 17, 15, 21, 12, 16, 15, 14, 17, 19, 18, 17, 18, 19,
16, 18, 15, 16, 18, 17, 19, 18, 15, 16, 18, 18, 17, 22, 18,
18, 12, 19, 16, 15, 17, 14, 17, 15, 19, 17, 18, 14, 17, 19,
20, 16, 6, 12, 17, 17, 16, 13, 20, 18, 16, 16, 18, 21, 17,
21, 13, 17, 14, 18, 15, 18, 17, 23, 19, 17, 18, 15, 17, 19,
15, 21, 17, 20, 16, 15, 18, 15, 18, 17, 18, 16, 18, 21, 16,
19, 21, 18, 16, 19, 18, 18, 18, 18, 18, 19, 20, 20, 22, 14,
19, 18, 16, 22, 14, 16, 17, 18, 15, 16, 19, 16, 19, 18, 18,
15, 18, 19, 16, 16, 18, 15, 13, 12, 20, 19, 18, 19, 13, 19,
19, 16, 20, 18, 18, 18, 18, 18, 18, 19, 15, 14, 18, 16, 15,
15, 18, 18, 18, 18, 20, 17, 16, 19, 18, 19, 17, 18, 18, 16,
16, 18, 15, 19, 19, 17, 17, 16, 15, 15, 15, 17, 12, 17, 17,
19, 14, 21, 19, 19, 18, 23, 18, 21, 18, 16, 17, 18, 13, 14,
17, 18, 16, 18, 16, 18, 18, 17, 17, 6, 22, 17, 18, 20, 18,
10, 18, 15, 10, 16, 16, 18, 18, 17, 21, 18, 18, 15, 13, 15,
17, 12, 16, 16, 16, 15, 20, 17, 14, 17, 17), format.spss = "F8.2", display_width = 10L),
mompa = structure(c(0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0,
0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1,
0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0,
1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1,
0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0,
0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0,
1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1,
0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,
1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1,
0, 0, 1, 0, 0), format.spss = "F8.2", display_width = 10L),
momabhx = structure(c(0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1,
1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1,
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1,
0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1,
0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1,
1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0,
0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1,
1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0,
1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1,
1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0,
1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 0, 1, 0, 1), format.spss = "F8.2", display_width = 10L),
capiabr1 = structure(c(36, 43, NA, NA, 90, 95, 128, 137,
136, 245, 322, 154, 87, 111, 181, 278, 173, 137, 69, 24,
27, 70, 34, 27, 11, 53, 31, 49, 14, 54, 131, 35, 43, 43,
60, 58, 55, 60, 18, 38, 76, 98, 41, 20, 117, 58, 98, 10,
16, 101, 120, 165, 44, 96, 23, 19, 53, 57, 77, 41, 53, 100,
90, 96, 91, 29, 54, 134, 134, 105, 106, NA, 125, 61, 72,
34, 215, 42, NA, 106, 47, 45, 107, 208, 191, NA, 50, 56,
222, 47, 89, 134, 204, 211, 228, NA, 24, 34, 34, 135, 174,
112, 239, 104, 102, 129, 71, 100, 159, 280, 97, 105, NA,
56, 76, 120, 176, 89, 154, 46, 59, 214, 53, 245, 197, 60,
425, 25, 62, 137, 199, 171, 191, 46, 49, 117, 183, 79, 47,
76, NA, 158, 151, 47, 70, 118, 198, 94, 43, 296, 108, 56,
277, 214, 331, NA, 293, 277, 41, 134, 134, 283, 87, 96, 126,
305, 152, 82, 308, 168, 274, NA, 48, 171, 98, 90, 84, 257,
144, 255, NA, 106, 67, 184, 173, 156, 243, 357, 116, 132,
226, 260, 308, 358, 225, 312, 102, 244, 87, 176, 270, 224,
136, 243, NA, 117, 234, 280, 133, 143, 234, 273, NA, 169,
145, 310, 255, 280, 58, 152, 239, 254, 322, 342, 288, NA,
155, 179, 206, 270, 173, 319, 194, 206, 319, 111, 408, 310,
324, 296, 288, 391, 409, 379, 311, 338), format.spss = "F3.0", display_width = 11L),
cbclint = structure(c(51, 55, NA, NA, 65, 57, 46, 58, 53,
56, 75, 65, 33, NA, 65, NA, 51, 65, 34, 60, 45, 29, 43, 37,
65, 49, 56, 64, 53, 51, 39, 43, 64, 61, 74, 29, 60, 53, 45,
43, 45, 49, 47, 47, 66, 57, 73, 41, 56, 37, 65, 45, 53, 60,
53, 33, 43, 51, 53, 45, 47, 59, NA, 47, 79, 68, 56, 66, 70,
47, 63, 61, 61, 56, 33, 53, 56, 43, 51, 55, 51, 73, 56, 88,
56, 59, 30, 54, 82, 50, 63, 51, 58, 37, 67, 58, 51, 52, 40,
72, 63, NA, 43, 56, 60, 48, 66, NA, 55, 47, 61, 56, 55, 51,
55, 40, 64, 40, 66, 76, 45, 63, 53, 47, 51, 70, 80, 40, 53,
51, 43, 54, 64, 53, 64, 58, 56, 60, 55, 40, 40, 49, 48, 41,
47, 56, 60, 53, 55, 49, 55, 33, 67, 58, 41, 46, 67, 63, 64,
73, 73, 60, 49, 40, 51, 45, 53, 49, 65, 54, 58, 51, 68, 45,
41, 53, 60, 55, 61, 66, 69, 66, 67, 70, 66, NA, 56, 58, 61,
67, 73, 47, 74, 65, 62, 72, 59, 60, 73, 64, 48, 56, 53, 81,
65, 65, 65, 65, 59, 56, 70, 68, 63, 64, 74, 60, 75, 58, 63,
43, 72, 69, 59, 71, 71, 64, 66, 63, 46, 66, 66, 66, 53, NA,
73, 68, 65, 68, 62, 57, 68, 69, 74, 65, 78, 47), format.spss = "F8.0", display_width = 10L),
bpsidrr1 = structure(c(NA, 21, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 18, NA, NA, NA, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 9, 8, 9, 10, 10, 10, 11,
11, 11, 9, 11, 8, 11, 9, 10, 12, 11, 13, 10, 8, 11, 10, 13,
12, 14, 9, 10, 13, 11, 11, 10, 13, 13, 13, 12, 10, 11, 13,
10, 13, 16, 12, 15, 10, 12, 13, 13, 11, 14, 15, 13, 13, 14,
13, 14, 13, 18, 13, 14, 14, 14, 15, 16, 17, 16, 14, 15, 14,
14, 15, 14, 20, 16, 16, 13, 17, 16, 15, 14, 16, 18, 17, 17,
19, 14, 17, 16, 16, 17, 16, 14, 14, 15, 17, 18, 17, 14, 14,
18, 17, 19, 16, 16, 17, 18, 15, 19, 16, 21, 18, 17, 19, 15,
20, 18, 19, 16, 18, 23, 15, 18, 20, 19, 12, 12, 21, 16, 17,
17, 20, 20, 19, 19, 22, 20, 19, 22, 14, 19, 19, 23, 19, 20,
19, 19, 20, 20, 23, 18, 19, 25, 20, 23, 20, 21, 22, 21, 21,
24, 22, 24, 22, 22, 18, 23, 24, 22, 22, 24, 21, 23, 21, 20,
21, 23, 23, 25, 24, 22, 23, 26, 23, 26, 26, 23, 26, 26, 23,
25, 24, 22, 27, 25, 24, 27, 23, 25, 25, 26, 23, 27, 30, 28,
29, 27, 31, 34, 32, 31, 34), format.spss = "F2.0", display_width = 11L),
ecbiir1 = structure(c(177, 197, 148, 133, 172, 133, 129,
NA, 159, 67, 141, 167, 111, 190, 174, NA, 137, 93, 99, 136,
54, 36, 36, 75, 126, 97, 68, 205, 110, NA, 109, 47, 93, 200,
183, 42, 73, 132, 82, 91, 154, 157, 82, 124, 207, 84, 188,
76, 104, 73, 185, 108, 140, 183, 52, 48, 100, 110, 109, 56,
88, 69, 189, 82, 210, 159, 68, 144, 119, 81, 190, 180, 199,
206, 72, 153, 151, NA, 115, 111, NA, 161, 118, 159, 127,
124, 136, 174, 232, 48, 161, 54, 74, 53, NA, 112, 148, 135,
137, 159, 75, 74, 36, 101, 142, 83, 132, 99, 141, 117, 117,
134, 105, 134, 147, 54, 206, 170, 69, 134, 64, 55, 129, 79,
110, 173, 159, 113, 163, 139, 111, 103, 93, 86, 179, 144,
167, 118, 124, 118, 91, 166, 66, 127, 54, 177, 108, 125,
115, 142, 130, 156, 152, 51, 132, 76, 155, 185, 148, 132,
146, 147, 134, 50, 158, 143, 142, 98, 111, 150, 138, NA,
221, 150, 167, 145, 146, 63, 201, 195, 192, 183, 168, 162,
170, NA, 87, 119, 171, 136, 66, 183, 162, NA, 168, 153, 151,
109, 147, 214, 156, 147, 148, 117, NA, 140, 124, 165, 175,
106, 198, 141, 183, 208, 201, 139, 171, 170, 165, 116, 226,
102, 157, 182, 161, 169, 208, 144, 140, 139, 128, 174, 158,
231, 168, 181, 211, 176, 159, 180, 110, 188, 151, 206, 205,
67), format.spss = "F3.0", display_width = 11L), mommhpsi = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, 35.75, 32.75, 32.75, 32.75, 32.75, 38.5, 38.5,
32.75, 32.75, 32.75, 32.75, 34.25, 36.5, 43, 43, 49, 33,
38, NA, 33.5, 36.5, 36.75, 43.75, NA, 33.75, 50, 35.75, 49.25,
34, 39, 45.25, 50.75, 50, NA, NA, 34.25, 34.25, 34.25, 38.25,
42.75, NA, 34.5, 42.75, 36.25, 43, NA, 34.75, 34.75, 39.5,
39.5, 39, 48, NA, NA, 35, 35, 38.5, 50.5, NA, 41.5, 38.25,
43.5, 44.5, 43, 51.75, 44.5, NA, NA, NA, NA, 35.5, 38.5,
35.5, 38.5, 42.75, 50.25, NA, NA, NA, NA, NA, NA, 35.75,
35.75, 45, 40.5, 46, NA, NA, NA, NA, 47, 45.75, NA, NA, NA,
NA, NA, NA, NA, 47, 39.25, 50.75, 42.25, 42.25, 44.75, 44,
43.75, NA, NA, NA, NA, NA, NA, 45.75, 40.5, 38.25, 42.25,
51.75, NA, NA, NA, NA, NA, 39.75, 43.25, 50.5, 53.5, 54,
NA, 52.75, NA, 37.25, 41.5, 46.5, NA, 55.25, NA, 59.75, 42.25,
44.25, 44.25, 48.25, 47, NA, NA, NA, 46.5, 49.75, 50, 49.25,
56.25, NA, NA, NA, 39.75, 47, 44, 41, 54.75, 55.25, NA, NA,
38.25, 51, 48.75, NA, 43.75, 50.25, NA, NA, 46.25, 57, 59.75,
58.5, 62.5, 62.25, NA, NA, 46.75, 46, 56.25, 55, 55.75, 58.25,
NA, 44.75, 49.5, 46.5, 57.25, 53, 60.5, 63, NA, NA, NA, 56.75,
NA, 60.5, 43.75, 39.75, 59.25, 58.75, 57.5, 56.5, 63, NA,
NA, NA, NA, 55.5, 50, NA, 61.25, 61.5, 61, 62.75, 66.5, 57,
64.75, NA, 59.25, 68.25, 65.25, NA, 68.75, 50)), .Names = c("id",
"peadiff", "ceadiff", "cdpea", "mompa", "momabhx", "capiabr1",
"cbclint", "bpsidrr1", "ecbiir1", "mommhpsi"), row.names = c(NA,
-246L), class = "data.frame")
Your code works correctly. The problem in given by the version of lavaan and semTools that you are using.
Following the suggestions given here by Terrence D. Jorgensen (one of the authors of semTools), start a new session of R and reinstall the two packages as follows:
install.packages("lavaan", repos = "http://www.da.ugent.be", type = "source")
# if necessary: install.packages("devtools")
devtools::install_github("simsem/semTools/semTools")
Now the commands:
fit5 <- runMI(model5, data = imputedData, fun="sem", ordered = "mompa")
summary(fit5, standardized = TRUE, ci = T)
give the following output:
Rubin's (1987) rules were used to pool point and SE estimates across 5 imputed data sets, and to calculate degrees of freedom for each parameter's t test and CI.
lavaan.mi object based on 5 imputed data sets.
See class?lavaan.mi help page for available methods.
Convergence information:
The model converged on 5 imputed data sets
Parameter Estimates:
Information Expected
Information saturated (h1) model
Standard Errors Robust.sem
Regressions:
Estimate Std.Err t df P(>|z|) ci.lower ci.upper Std.lv Std.all
ceadiff ~
mompa 0.473 0.165 2.863 2016.256 0.004 0.149 0.797 0.473 0.223
cdpea 0.137 0.038 3.589 2507.509 0.000 0.062 0.212 0.137 0.157
momabhx -0.251 0.302 -0.831 Inf 0.406 -0.843 0.341 -0.251 -0.059
mompa ~
peadiff (b1) 0.108 0.035 3.091 Inf 0.002 0.039 0.176 0.108 0.245
momabhx (c) 0.548 0.165 3.324 Inf 0.001 0.225 0.871 0.548 0.273
cdpea -0.048 0.031 -1.525 Inf 0.127 -0.109 0.014 -0.048 -0.116
mommhpsi (b2) -0.022 0.009 -2.365 61.332 0.021 -0.040 -0.003 -0.022 -0.192
...

Resources