Flat file splitting in Biztalk 2010 - biztalk

I have txt file with following data:
1, Shailender, Singh, test#test.com, 98799, 5000, New Delhi, Delhi, India
2, Shailender, Singh, test#test.com, 98799, 5000, New Delhi, Delhi, India
3, Shailender, Singh, test#test.com, 98799, 5000, New Delhi, Delhi, India
4, Shailender, Singh, test#test.com, 98799, 5000, New Delhi, Delhi, India
5, Shailender, Singh, test#test.com, 98799, 5000, New Delhi, Delhi, India
6, Shailender, Singh, test#test.com, 98799, 5000, New Delhi, Delhi, India
7, Shailender, Singh, test#test.com, 98799, 5000, New Delhi, Delhi, India
8, Shailender, Singh, test#test.com, 98799, 5000, New Delhi, Delhi, India
9, Shailender, Singh, test#test.com, 98799, 5000, New Delhi, Delhi, India
10, Shailender, Singh, test#test.com, 98799, 5000, New Delhi, Delhi, India
11, Shailender, Singh, test#test.com, 98799, 5000, New Delhi, Delhi, India
12, Shailender, Singh, test#test.com, 98799, 5000, New Delhi, Delhi, India
13, Shailender, Singh, test#test.com, 98799, 5000, New Delhi, Delhi, India
14, Shailender, Singh, test#test.com, 98799, 5000, New Delhi, Delhi, India
15, Shailender, Singh, test#test.com, 98799, 5000, New Delhi, Delhi, India
16, Shailender, Singh, test#test.com, 98799, 5000, New Delhi, Delhi, India
17, Shailender, Singh, test#test.com, 98799, 5000, New Delhi, Delhi, India
Consider there are 20 records in the txt file in above format.
Now I need to create multiple txt file out of the above file with max 7 records in each txt file.

Use the standard flat-file wizard to produce your schema, assuming an unlimited (Unbound) number of records.
Then, manually change the Max Occurs attribute of your 'Line' node (not the root node) from 'Unbounded' to '7'. BizTalk will split the inbound document into groups of at most 7 records.

Related

GGplot Plotting Each Point Twice

I am trying to make an animated bubble chart for a baseball league I'm in. Once I create the animated graph and convert it into a gif, it plots each team twice, as shown in the picture below. The legend should only hold 14 points/teams, but it shows 28 instead.
My code is the following:
library(ggplot2)
library(gganimate)
library(readxl)
library(gifski)
library(png)
myData <- read_excel("~/Desktop/Dynasty - Fantasy Baseball.xlsx")
# Make a ggplot, but add frame=year: one image per year
g <- ggplot(myData, aes(PF, PA, size = `W%`, color = Team)) +
geom_point() +
theme_bw() +
# gganimate specific bits:
labs(title = 'Period: {frame_time-1900}', x = 'Points For', y = 'Points Against') +
transition_time(Year) +
ease_aes('linear')
# Save at gif:
anim_save(filename = "~/Desktop/FantasyBaseballAnimated.gif", animation = g)
My data is stored in the following:
structure(list(Team = c("Houston Astros", "Miami Marlins", "New York Mets",
"Atlanta Braves", "St. Louis Cardinals", "Cincinatti Reds", "Philadelphia Reds",
"Baltimore Orioles", "Milwaukee Brewers", "Washington Nationals",
"Montreal Expos", "Tampa Bay Rays", "Seattle Mariners", "Brooklyn Dodgers",
"Houston Astros", "Miami Marlins", "New York Mets", "Atlanta Braves",
"St. Louis Cardinals", "Cincinatti Reds", "Philadelphia Reds",
"Baltimore Orioles", "Milwaukee Brewers", "Washington Nationals",
"Montreal Expos", "Tampa Bay Rays", "Seattle Mariners", "Brooklyn Dodgers",
"New York Mets ", "St. Louis Cardinals ", "Cincinatti Reds ",
"Washington Nationals ", "Atlanta Braves ", "Miami Marlins ",
"Philadelphia Phillies ", "Tampa Bay Rays ", "Houston Astros ",
"Montreal Expos ", "Baltimore Orioles ", "Milwaukee Brewers ",
"Seattle Mariners ", "Brooklyn Dodgers ", "St. Louis Cardinals ",
"Washington Nationals ", "Miami Marlins ", "Cincinatti Reds ",
"New York Mets ", "Atlanta Braves ", "Tampa Bay Rays ", "Houston Astros ",
"Milwaukee Brewers ", "Philadelphia Phillies ", "Baltimore Orioles ",
"Montreal Expos ", "Seattle Mariners ", "Brooklyn Dodgers ",
"Washington Nationals ", "St. Louis Cardinals ", "Atlanta Braves ",
"Cincinatti Reds ", "New York Mets ", "Houston Astros ", "Miami Marlins ",
"Philadelphia Phillies ", "Tampa Bay Rays ", "Milwaukee Brewers ",
"Baltimore Orioles ", "Montreal Expos ", "Seattle Mariners ",
"Brooklyn Dodgers ", "St. Louis Cardinals ", "Washington Nationals ",
"Philadelphia Phillies ", "Miami Marlins ", "Atlanta Braves ",
"New York Mets ", "Houston Astros ", "Milwaukee Brewers ",
"Cincinatti Reds ", "Tampa Bay Rays ", "Montreal Expos ",
"Baltimore Orioles ", "Seattle Mariners ", "Brooklyn Dodgers ",
"New York Mets ", "St. Louis Cardinals ", "Washington Nationals ",
"Philadelphia Phillies ", "Miami Marlins ", "Houston Astros ",
"Atlanta Braves ", "Milwaukee Brewers ", "Cincinatti Reds ",
"Tampa Bay Rays ", "Montreal Expos ", "Baltimore Orioles ",
"Seattle Mariners ", "Brooklyn Dodgers ", "St. Louis Cardinals ",
"Washington Nationals ", "Houston Astros ", "New York Mets ",
"Philadelphia Phillies ", "Milwaukee Brewers ", "Atlanta Braves ",
"Miami Marlins ", "Cincinatti Reds ", "Tampa Bay Rays ", "Baltimore Orioles ",
"Montreal Expos ", "Seattle Mariners ", "Brooklyn Dodgers "
), W = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 9, 8,
7, 6, 6, 5, 6, 5, 4, 3, 2, 2, 2, 17, 17, 16, 14, 14, 14, 12,
11, 13, 7, 7, 6, 3, 3, 25, 24, 22, 21, 20, 20, 18, 19, 16, 14,
12, 9, 8, 5, 33, 32, 27, 27, 25, 26, 25, 23, 21, 21, 16, 15,
11, 7, 37, 37, 35, 34, 33, 32, 32, 29, 29, 27, 21, 19, 17, 7,
44, 43, 43, 40, 38, 40, 37, 37, 35, 32, 25, 23, 20, 7, 52, 50,
50, 48, 48, 43, 42, 40, 41, 38, 34, 28, 25, 8), L = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 3, 4, 6, 5, 6, 5, 6,
7, 8, 9, 10, 5, 5, 7, 7, 8, 9, 9, 9, 11, 14, 15, 15, 19, 21,
8, 9, 11, 13, 13, 13, 14, 16, 17, 19, 21, 22, 26, 31, 11, 12,
16, 19, 18, 19, 20, 22, 21, 22, 28, 28, 33, 40, 18, 18, 22, 22,
22, 22, 25, 25, 28, 27, 34, 36, 38, 52, 22, 22, 22, 28, 27, 29,
28, 28, 33, 31, 42, 42, 46, 64, 25, 27, 31, 30, 32, 33, 34, 37,
39, 37, 43, 51, 53, 75), T = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 2, 2, 2, 0, 2, 0, 2, 2, 2, 2, 1, 0, 2, 2, 1,
3, 2, 1, 3, 4, 0, 3, 2, 3, 2, 0, 3, 3, 3, 2, 3, 3, 4, 1, 3, 3,
3, 5, 2, 0, 4, 4, 5, 2, 5, 3, 3, 3, 6, 5, 4, 5, 4, 1, 5, 5, 3,
4, 5, 6, 3, 6, 3, 6, 5, 5, 5, 1, 6, 7, 7, 4, 7, 3, 7, 7, 4, 9,
5, 7, 6, 1, 7, 7, 3, 6, 4, 8, 8, 7, 4, 9, 7, 5, 6, 1), `W%` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.833, 0.792, 0.75, 0.667,
0.583, 0.5, 0.5, 0.5, 0.5, 0.417, 0.333, 0.25, 0.208, 0.167,
0.75, 0.75, 0.688, 0.646, 0.625, 0.604, 0.562, 0.542, 0.542,
0.354, 0.333, 0.312, 0.167, 0.125, 0.736, 0.708, 0.653, 0.611,
0.597, 0.597, 0.556, 0.542, 0.486, 0.431, 0.375, 0.319, 0.25,
0.139, 0.729, 0.708, 0.615, 0.583, 0.573, 0.573, 0.552, 0.51,
0.5, 0.49, 0.375, 0.365, 0.271, 0.156, 0.658, 0.658, 0.608, 0.6,
0.592, 0.583, 0.558, 0.533, 0.508, 0.5, 0.392, 0.358, 0.325,
0.125, 0.653, 0.646, 0.646, 0.583, 0.576, 0.576, 0.562, 0.562,
0.514, 0.507, 0.382, 0.368, 0.319, 0.104, 0.661, 0.637, 0.613,
0.607, 0.595, 0.56, 0.548, 0.518, 0.512, 0.506, 0.446, 0.363,
0.333, 0.101), `Div Rec` = c("0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0-0-0", "0-0-0", "37470",
"0-0-0", "0-0-0", "36683", "0-0-0", "36683", "0-0-0", "0-0-0",
"0-0-0", "37295", "0-0-0", "0-0-0", "17-5-2", "0-0-0", "36683",
"0-0-0", "36712", "36653", "0-0-0", "37295", "36594", "0-0-0",
"36683", "0-0-0", "0-0-0", "0-0-0", "37106", "36801", "36653",
"37207", "20-13-3", "13-10-1", "37512", "36594", "0-0-0", "36566",
"36683", "0-0-0", "36653", "0-0-0", "19-4-1", "37106", "13-10-1",
"37207", "25-18-5", "37541", "36754", "36843", "37512", "37381",
"36683", "0-0-0", "37482", "36931", "13-9-2", "19-4-1", "23-13-0",
"17-18-1", "13-10-1", "25-18-5", "37541", "37381", "13-21-2",
"15-19-2", "36683", "36683", "14-19-3", "36943", "25-18-5", "13-9-2",
"25-8-3", "28-19-1", "17-18-1", "18-16-2", "13-10-1", "13-8-3",
"19-26-3", "15-19-2", "36813", "37541", "17-27-4", "36943", "22-12-2",
"25-8-3", "18-16-2", "25-18-5", "28-19-1", "13-8-3", "13-10-1",
"17-18-1", "19-26-3", "15-19-2", "21-13-2", "13-23-0", "17-27-4",
"3-32-1"), GB = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0.5, 1, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7.5, 8, 0, 0, 1.5, 2.5, 3,
3.5, 4.5, 5, 5, 9.5, 10, 10.5, 14, 15, 0, 1, 3, 4.5, 5, 5, 6.5,
7, 9, 11, 13, 15, 17.5, 21.5, 0, 1, 5.5, 7, 7.5, 7.5, 8.5, 10.5,
11, 11.5, 17, 17.5, 22, 27.5, 0, 0, 3, 3.5, 4, 4.5, 6, 7.5, 9,
9.5, 16, 18, 20, 32, 0, 0.5, 0.5, 5, 5.5, 5.5, 6.5, 6.5, 10,
10.5, 19.5, 20.5, 24, 39.5, 0, 2, 4, 4.5, 5.5, 8.5, 9.5, 12,
12.5, 13, 18, 25, 27.5, 47), PF = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 10, 9.5, 9, 8, 7, 6, 6, 6, 6, 5, 4, 3, 2.5, 2,
18, 18, 16.5, 15.5, 15, 14.5, 13.5, 13, 13, 8.5, 8, 7.5, 4, 3,
26.5, 25.5, 23.5, 22, 21.5, 21.5, 20, 19.5, 17.5, 15.5, 13.5,
11.5, 9, 5, 35, 34, 29.5, 28, 27.5, 27.5, 26.5, 24.5, 24, 23.5,
18, 17.5, 13, 7.5, 39.5, 39.5, 36.5, 36, 35.5, 35, 33.5, 32,
30.5, 30, 23.5, 21.5, 19.5, 7.5, 47, 46.5, 46.5, 42, 41.5, 41.5,
40.5, 40.5, 37, 36.5, 27.5, 26.5, 23, 7.5, 55.5, 53.5, 51.5,
51, 50, 47, 46, 43.5, 43, 42.5, 37.5, 30.5, 28, 8.5), PA = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2.5, 3, 4, 5, 6, 6,
6, 6, 7, 8, 9, 9.5, 10, 6, 6, 7.5, 8.5, 9, 9.5, 10.5, 11, 11,
15.5, 16, 16.5, 20, 21, 9.5, 10.5, 12.5, 14, 14.5, 14.5, 16,
16.5, 18.5, 20.5, 22.5, 24.5, 27, 31, 13, 14, 18.5, 20, 20.5,
20.5, 21.5, 23.5, 24, 24.5, 30, 30.5, 35, 40.5, 20.5, 20.5, 23.5,
24, 24.5, 25, 26.5, 28, 29.5, 30, 36.5, 38.5, 40.5, 52.5, 25,
25.5, 25.5, 30, 30.5, 30.5, 31.5, 31.5, 35, 35.5, 44.5, 45.5,
49, 64.5, 28.5, 30.5, 32.5, 33, 34, 37, 38, 40.5, 41, 41.5, 46.5,
53.5, 56, 75.5), Period = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7), Place = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1, 2,
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), Year = c(1900,
1900, 1900, 1900, 1900, 1900, 1900, 1900, 1900, 1900, 1900, 1900,
1900, 1900, 1901, 1901, 1901, 1901, 1901, 1901, 1901, 1901, 1901,
1901, 1901, 1901, 1901, 1901, 1902, 1902, 1902, 1902, 1902, 1902,
1902, 1902, 1902, 1902, 1902, 1902, 1902, 1902, 1903, 1903, 1903,
1903, 1903, 1903, 1903, 1903, 1903, 1903, 1903, 1903, 1903, 1903,
1904, 1904, 1904, 1904, 1904, 1904, 1904, 1904, 1904, 1904, 1904,
1904, 1904, 1904, 1905, 1905, 1905, 1905, 1905, 1905, 1905, 1905,
1905, 1905, 1905, 1905, 1905, 1905, 1906, 1906, 1906, 1906, 1906,
1906, 1906, 1906, 1906, 1906, 1906, 1906, 1906, 1906, 1907, 1907,
1907, 1907, 1907, 1907, 1907, 1907, 1907, 1907, 1907, 1907, 1907,
1907)), row.names = c(NA, -112L), class = c("tbl_df", "tbl",
"data.frame"))
I thought factoring it would work, and also parsing it but neither worked:
#first thought
myData$Team <- factor(myData$Team)
summary(myData)
#second thought
myData$Team <- eval(parse(text = myData$Team))
Am I just missing something obvious? I'm drawing a blank at how I could fix this. Any help would be greatly appreciated!
It looks like you need to do some data cleaning:
data %>% group_by(Team) %>%
summarise(count = n())
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 28 x 2
Team count
<chr> <int>
1 "Atlanta Braves" 2
2 "Atlanta Braves " 6
3 "Baltimore Orioles" 2
4 "Baltimore Orioles " 6
5 "Brooklyn Dodgers" 2
6 "Brooklyn Dodgers " 6
7 "Cincinatti Reds" 2
8 "Cincinatti Reds " 6
9 "Houston Astros" 2
10 "Houston Astros " 6
# ... with 18 more rows
Using stringr:
data <- data %>%
mutate(Team = str_trim(Team, side = c("both")))
Answer
Remove the whitespace around the names:
myData$Team <- trimws(myData$Team)
Rationale
You actually have each team in there twice. Half just contain a single space at the end of their name. You may want to look into WHY that is happening.
table(myData$Team, myData$Year)[1:2, ]
# 1900 1901 1902 1903 1904 1905 1906 1907
# Atlanta Braves 1 1 0 0 0 0 0 0
# Atlanta Braves 0 0 1 1 1 1 1 1
sort(unique(myData$Team))[1:2]
#[1] "Atlanta Braves" "Atlanta Braves "

How do categorize I based on value before policy change and track every changes after policy changes?

I am new to R and perhaps my question is very silly. First of all, I would like to describe my data and then the problem.
I have (unbalanced) panel of monthly household consumption data from Jan 2000 to Dec 2010. In Jan 2005, consumption tax increased from 7% to 10%. At this moment, I am trying to understand the data more and get very deeper understanding of the data.
For this purpose, I would like to take an average of 12 months consumption before the tax increase, that is Jan 2004 to Dec 2004. Then using this computed mean, I would like to classify households into 4 categories: first category USD 1000-2500, second category USD 2501 - 5000, third category USD 5001-7500, and fourth category USD 7501 - 10000. (in data set minimum monthly consumption expenditure is USD 1000 and max is USD 10,000.00)
Using the above categorization criteria, I would like to check by how much expenditure has increased in Jan 2005, feb 2005 through dec 2010 for each category. I have been struggling on this issue for about 3 weeks and I could not figure how to even start. I would be highly grateful any suggestions and help. Thank you so much in advance.
I am using confidential data from tax office and I am not able to share the same dataset. However, I created the data that is similar to it:
data2 <- structure(list(id = c(1223, 1223, 1223, 1223, 1223, 1223, 1223,
1223, 1223, 1223, 1223, 1223, 1223, 1223, 1223, 1223, 1223, 1223,
1223, 1223, 1223, 1223, 1223, 1223, 1224, 1224, 1224, 1224, 1224,
1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224,
1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224), con = c(1954,
1965, 2220, 1789, 2855, 2192, 1028, 2745, 1190, 2892, 1941, 1045,
1778, 1660, 1037, 1259, 1655, 1429, 1617, 1927, 1105, 1948, 1929,
1673, 7309, 9420, 9849, 7824, 7522, 7448, 7370, 6717, 9024, 7635,
9316, 5173, 9071, 5997, 6315, 6636, 9978, 8077, 9170, 5440, 9442,
6668, 5732, 8460), year = c(2004, 2004, 2004, 2004, 2004, 2004,
2004, 2004, 2004, 2004, 2004, 2004, 2005, 2005, 2005, 2005, 2005,
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2004, 2004, 2004, 2004,
2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2005, 2005, 2005,
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005), month = c(1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4,
5, 6, 7, 8, 9, 10, 11, 12)), row.names = c(NA, -48L), class = c("tbl_df",
"tbl", "data.frame"))
FYI, I use the dplyr package (part of the tidyverse) throughout this answer. I've also made the assumption that you want to compare the post-2004 consumption for each category as a whole, to the category average in 2004, rather than on an individual family basis. If this isn't correct let me know and I can alter the answer.
First, I make a separate table for the 2004 data only, and use this to calculate the mean consumption per ID for the whole year (using summarise()), then make a new column with the category each ID falls in (using mutate() and case_when()), and then calculate the mean consumption for each category.
data2_2004 <-
data2 %>%
filter(year == 2004) %>%
group_by(id) %>%
summarise(mean_con_2004_id = mean(con)) %>%
mutate(household_category = case_when(between(mean_con_2004_id, 1000, 2500) ~ "cat1",
between(mean_con_2004_id, 2501, 5000) ~ "cat2",
between(mean_con_2004_id, 5001, 7500) ~ "cat3",
between(mean_con_2004_id, 7501, 10000) ~ "cat4")) %>%
group_by(household_category) %>%
mutate(mean_con_2004_category = mean(mean_con_2004_id))
> data2_2004
# A tibble: 2 x 4
# Groups: household_category [2]
id mean_con_2004_id household_category mean_con_2004_category
<dbl> <dbl> <chr> <dbl>
1 1223 1985. cat1 1985.
2 1224 7884. cat4 7884.
Then, I filter your dataframe for data after 2004, and use left_join() to merge it with the 2004 data to add the mean consumption per family ID, category, and mean consumption per category.
data2_post2004 <- data2 %>%
filter(year > 2004) %>%
left_join(., data2_2004)
> data2_post2004
# A tibble: 24 x 7
id con year month mean_con_2004_id household_category mean_con_2004_category
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 1223 1778 2005 1 1985. cat1 1985.
2 1223 1660 2005 2 1985. cat1 1985.
3 1223 1037 2005 3 1985. cat1 1985.
4 1223 1259 2005 4 1985. cat1 1985.
5 1223 1655 2005 5 1985. cat1 1985.
6 1223 1429 2005 6 1985. cat1 1985.
7 1223 1617 2005 7 1985. cat1 1985.
8 1223 1927 2005 8 1985. cat1 1985.
9 1223 1105 2005 9 1985. cat1 1985.
10 1223 1948 2005 10 1985. cat1 1985.
# ... with 14 more rows
From here, you can do whatever comparisons you want to. For example, to compare the mean consumption in each category each month to the 2004 average for that category:
data2_post2004_summary <- data2_post2004 %>%
group_by(household_category, year, month, mean_con_2004_category) %>%
summarise(mean_con = mean(con)) %>%
mutate(diff_2004 = mean_con - mean_con_2004_category) %>%
mutate(percent_diff_2004 = diff_2004/mean_con_2004_category * 100)
If you want to plot the data instead, you can convert the year + month columns to a date column before plotting.
data2_post2004_summary %>%
mutate(date = as.Date(paste(year, month, "01", sep = "-"))) %>%
ggplot(aes(x = date, y = mean_con)) +
geom_line() +
geom_line(aes(y = mean_con_2004_category), linetype = "dotted") +
facet_wrap(facets = vars(household_category))

Working across two dataframes: Apply or for-loop?

I have two dataframes and one function. The function is supposed to take the variables start_month & end_month, select for each row the values in the second dataframe in the month-column, calculate the rate_of_change between each start_month and end_month variable in a given year. Finally calculate the mean(rate_of_change) and place it into the first dataframe as a new variable in the vector average_ratio.
So far I've created a code that calculates the average ratio, but I can't manage to put it into a for loop or an apply function so that the loop runs through the whole first data frame. I have two ideas, but they don't work so far.
structure(Total) # Df containing total combinations of all existing month starting in September
.
i | start_month | end_month | average_ratio (expected output)
1 | 9 | 10 | -23
2 | 9 | 11 | 13
3 | 9 | 12 | -4
4 | 9 | 1 |
5 | 9 | 2 | # ... with 61 more rows
and
structure(Cologne)
# A tibble: 3,000 x 4
year month price town (rate of change)
<dbl> <dbl> <dbl> <chr>
1 1531 7 7575 Cologne
2 1531 8 588 Cologne
3 1531 9 615 Cologne
4 1531 10 69 Cologne -88%
5 1531 11 712 Cologne
6 1531 12 590 Cologne
7 1532 1 72 Cologne
8 1532 2 675 Cologne
9 1532 3 6933 Cologne
10 1532 4 54 Cologne
11 1532 5 425 Cologne
12 1532 6 12 Cologne
13 1532 7 323 Cologne
14 1532 8 32 Cologne
15 1532 9 58 Cologne
16 1532 10 84 Cologne 42%
# ... with 2,990 more rows
# rate of change function
rateofchange <- function(x,y) {
((x-y)/y)*100
}
# avg_ratio function
avg_ratio <- function(x,y,z) {
dt.frame <- filter(x, month==y | month==z)
pre_p <- lag(dt.frame$price, 1)
dt.frame <- cbind(dt.frame, pre_p)
for (i in 1:nrow(dt.frame)) {
dt.frame$roc <- rateofchange(dt.frame$price,dt.frame$pre_p)
}
result <- mean(dt.frame$roc,na.rm=TRUE)
return(result)
}
May_Aug <- avg_ratio(Cologne, 5,7)
################ works until here ################
# Now, Idea 1
Total <- Total %>%
mutate(Total, ratio = avg_ratio(Cologne,Total$start_mth,Total$end_mth)
)
Warning messages:
1: In month == y :
longer object length is not a multiple of shorter object length
2: In month == z :
longer object length is not a multiple of shorter object length
# and Idea 2
ratio <- c()
Total_new <- for(i in 1:nrow(Total)) {
ratio [i] <- c(ratio, avg_ratio(Cologne,Total$start_mth[i],Total$end_mth[i]))
return(cbind(Total,ratio))
}
> dput(Cologne[1:20,])
structure(list(year = c(1531, 1531, 1531, 1531, 1531, 1531, 1532,
1532, 1532, 1532, 1532, 1532, 1532, 1532, 1532, 1532, 1532, 1532,
1533, 1533), month = c(7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12, 1, 2), price = c(7575, 588, 615, 69, 712,
72, 72, 675, 6933, 70, 656, 66, 62, 48, 48, 462, 45, 45, 456,
46), town = c("Cologne", "Cologne", "Cologne", "Cologne", "Cologne",
"Cologne", "Cologne", "Cologne", "Cologne", "Cologne", "Cologne",
"Cologne", "Cologne", "Cologne", "Cologne", "Cologne", "Cologne",
"Cologne", "Cologne", "Cologne")), spec = structure(list(cols = list(
Jahr = structure(list(), class = c("collector_double", "collector"
)), Monat = structure(list(), class = c("collector_double",
"collector")), cologne_wheat_monthly = structure(list(), class = c("collector_number",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"), row.names = c(NA,
20L), class = c("tbl_df", "tbl", "data.frame"))
> dput(Total) structure(list(start_mth = c(9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 6, 7), end_mth = c(10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 12, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 2, 3, 4, 5, 6, 7, 8, 3, 4, 5, 6, 7, 8, 4, 5, 6, 7, 8, 5, 6, 7, 8, 6, 7, 8, 7, 8, 8)), class = "data.frame", row.names = c(NA, -66L))
You can do:
Total$average_ratio <- mapply(avg_ratio, y = Total$start_mth, z = Total$end_mth, MoreArgs = list(x = cologne))
Your function is not vectorized, that's why this doesn't work:
Total <- Total %>%
mutate(ratio = avg_ratio(cologne, start_mth, end_mth))
The mapply() function iterates (or vectorizes) through the arguments provided, you don't want to iterate over cologne however, that's why you pass it inside MoreArgs = , so it gets taken as it is.

Creating a new column in my data frame based on a function

I have a data frame that has NFL teams and some data about them. I'm wanting to add Points per game for each team for that particular week.
I cannot just summarize the data by team as I need the individual game the way it's currently represented.
CurrYrfun <- function(Yr,Tm,Wk){
PPG <- Schedule_Results %>%
filter(Year == Yr & Team == Tm & Week < Wk) %>%
group_by(Team) %>%
summarize(APG = mean(Pts))
return(PPG[['APG']])
}
This function gives the correct result for individual records, but when I try to mutate a new column in the dataframe as below:
Schedule_Results <- Schedule_Results %>%
mutate(PPG = CurrYrfun(Year, Team, Week))
I get an error saying PPG is of length 0. I've tried to attach a picture of the dataframe, so you have an idea of the data I'm working with.dataframe snapshot here
Edited to include data and examples:
Schedule_Results <- structure(list(Year = c(2019L, 2019L, 2019L, 2019L, 2019L, 2019L,
2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L, 2019L,
2019L, 2019L, 2019L, 2019L, 2019L, 2019L), Week = c(17, 17, 17,
16, 16, 16, 15, 15, 15, 14, 14, 14, 13, 13, 13, 12, 12, 12, 11,
11, 11), Team = c("Washington Redskins", "Cincinnati Bengals",
"Jacksonville Jaguars", "Jacksonville Jaguars", "Washington Redskins",
"Cincinnati Bengals", "Cincinnati Bengals", "Washington Redskins",
"Jacksonville Jaguars", "Washington Redskins", "Cincinnati Bengals",
"Jacksonville Jaguars", "Jacksonville Jaguars", "Washington Redskins",
"Cincinnati Bengals", "Cincinnati Bengals", "Jacksonville Jaguars",
"Washington Redskins", "Washington Redskins", "Jacksonville Jaguars",
"Cincinnati Bengals"), Opp = c("Dallas Cowboys", "Cleveland Browns",
"Indianapolis Colts", "Atlanta Falcons", "New York Giants", "Miami Dolphins",
"New England Patriots", "Philadelphia Eagles", "Oakland Raiders",
"Green Bay Packers", "Cleveland Browns", "Los Angeles Chargers",
"Tampa Bay Buccaneers", "Carolina Panthers", "New York Jets",
"Pittsburgh Steelers", "Tennessee Titans", "Detroit Lions", "New York Jets",
"Indianapolis Colts", "Oakland Raiders"), Pts = c(16, 33, 38,
12, 35, 35, 13, 27, 20, 15, 19, 10, 11, 29, 22, 10, 20, 19, 17,
13, 10), Opp_Pts = c(47, 23, 20, 24, 41, 38, 34, 37, 16, 20,
27, 45, 28, 21, 6, 16, 42, 16, 34, 33, 17), Yds = c(271, 361,
353, 288, 361, 430, 315, 352, 262, 262, 451, 252, 242, 362, 277,
244, 369, 230, 225, 308, 246), Opp_Yds = c(517, 313, 275, 518,
552, 502, 291, 415, 364, 341, 333, 525, 315, 278, 271, 338, 471,
364, 400, 389, 386), TO = c(2, 1, 1, 1, 0, 1, 5, 1, 0, 1, 1,
0, 4, 0, 0, 2, 1, 2, 1, 1, 2), Opp_TO = c(1, 3, 2, 2, 0, 1, 0,
1, 0, 1, 2, 0, 1, 2, 0, 1, 2, 4, 2, 2, 2), Home = c("1", "1",
"1", "1", "0", "1", "0", "0", "0", "1", "1", "0", "0", "0", "1",
"0", "1", "1", "0", "1", "1"), Playoffs = c(0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), win = c("0", "1",
"1", "0", "0", "0", "0", "0", "1", "0", "0", "0", "0", "1", "1",
"0", "0", "1", "0", "0", "0")), row.names = c(NA, -21L), class = "data.frame")
CurrYrfun <- function(Yr,Tm,Wk){
PPG <- Schedule_Results %>%
filter(Year == Yr & Team == Tm & Week < Wk) %>%
group_by(Team) %>%
summarize(APG = mean(Pts))
return(PPG[['APG']])
}
CurrYrfun(2019,'Washington Redskins',13)
CurrYrfun(2019,'Jacksonville Jaguars',14)
CurrYrfun(2019,'Washington Redskins',16)
CurrYrfun(2019,'Cincinnati Bengals',15)
Schedule_Results <- Schedule_Results %>%
mutate(PPG = CurrYrfun(Year, Team, Week))
My goal is to return the output of the function for each row as a new column in the dataframe
I'm pretty sure this is what you want. I spot-checked the first couple examples you give, and they look right.
Schedule_Results %>%
group_by(Team, Year) %>%
arrange(Week) %>%
mutate(PPG = lag(cummean(Pts), 1))
# # A tibble: 21 x 14
# # Groups: Team, Year [3]
# Year Week Team Opp Pts Opp_Pts Yds Opp_Yds TO Opp_TO Home Playoffs win PPG
# <int> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr> <dbl>
# 1 2019 11 Washington Reds~ New York Jets 17 34 225 400 1 2 0 0 0 NA
# 2 2019 11 Jacksonville Ja~ Indianapolis Co~ 13 33 308 389 1 2 1 0 0 NA
# 3 2019 11 Cincinnati Beng~ Oakland Raiders 10 17 246 386 2 2 1 0 0 NA
# 4 2019 12 Cincinnati Beng~ Pittsburgh Stee~ 10 16 244 338 2 1 0 0 0 10
# 5 2019 12 Jacksonville Ja~ Tennessee Titans 20 42 369 471 1 2 1 0 0 13
# 6 2019 12 Washington Reds~ Detroit Lions 19 16 230 364 2 4 1 0 1 17
# 7 2019 13 Jacksonville Ja~ Tampa Bay Bucca~ 11 28 242 315 4 1 0 0 0 16.5
# 8 2019 13 Washington Reds~ Carolina Panthe~ 29 21 362 278 0 2 0 0 1 18
# 9 2019 13 Cincinnati Beng~ New York Jets 22 6 277 271 0 0 1 0 1 10
# 10 2019 14 Washington Reds~ Green Bay Packe~ 15 20 262 341 1 1 1 0 0 21.7
...

[r][network][sna]How to hide the output of the component.dist () function

How to hide the output result when using the following sna::component.dist () function?
g<-sna::rgraph(20,tprob=0.05) #Generate a sparse random graph
g1 <- sna::component.dist(g)
Node 1, Reach 2, Total 2
Node 2, Reach 1, Total 3
Node 3, Reach 13, Total 16
Node 4, Reach 4, Total 20
Node 5, Reach 5, Total 25
Node 6, Reach 2, Total 27
Node 7, Reach 1, Total 28
Node 8, Reach 1, Total 29
Node 9, Reach 1, Total 30
Node 10, Reach 5, Total 35
Node 11, Reach 1, Total 36
Node 12, Reach 5, Total 41
Node 13, Reach 14, Total 55
Node 14, Reach 8, Total 63
Node 15, Reach 7, Total 70
Node 16, Reach 3, Total 73
Node 17, Reach 5, Total 78
Node 18, Reach 5, Total 83
Node 19, Reach 1, Total 84
Node 20, Reach 5, Total 89

Resources