Related
I have two tables first table has stress score recorded at various time points and second table has date of treatment. I want to get the stress scores before and after treatment for each participant who has received the treatment. Also I want a column that gives information on when was the stress score recorded before and after treatment. I do not understand from where do I begin,and what should my code look like.
score.dt = data.table(
participant.index = c(1, 1, 1, 3, 4, 4, 13, 21, 21, 25, 37, 40, 41, 41, 41, 43, 43, 43, 44),
repeat.instance = c(2, 3, 6, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 3, 1, 2, 3, 1),
date.recorded = c(
'2017-07-13',
'2017-06-26',
'2018-09-17',
'2016-04-14',
'2014-03-24',
'2016-05-30',
'2018-06-20',
'2014-08-03',
'2015-07-06',
'2014-12-17',
'2014-09-05',
'2013-06-10',
'2015-10-04',
'2016-11-04',
'2016-04-18',
'2014-02-13',
'2013-05-24',
'2014-09-10',
'2014-11-25'
),
subscale = c(
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress",
"stress"
),
score = c(18, 10, 18, 36, 16, 30, 28, 10, 12, 40, 16, 12, 10, 14, 6, 32, 42, 26, 18)
)
date.treatment.dt = data.table (
participant.index = c(1, 4, 5, 6, 8, 10, 11, 12, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26),
date.treatment = c(
'2018 - 06 - 27',
'2001 - 07 - 16',
'2009 - 12 - 09',
'2009 - 05 - 20',
'2009 - 07 - 22',
'2008-07 - 02',
'2009 - 11 - 25',
'2009 - 09 - 16',
'1991 - 07 - 30',
'2016 - 05 - 25',
'2012 - 07 - 25',
'2007 - 03 - 19',
'2012 - 01 - 25',
'2011 - 09 - 21',
'2000 - 03 - 06',
'2001 - 09 - 25',
'1999 - 12 - 20',
'1997 -07 - 28',
'2002 - 03 - 12',
'2008 - 01 - 23'
))
Desired output columns: is something like this
score.date.dt = c("candidate.index.x", "repeat.instance", "subscale", "score", "date.treatment", "date.recorded", "score.before.treatment", "score.after.treatment", "months.before.treatment", "months.after.treatment")
Here the columns months.before.treatment indicates how many months before treatment the stress score was measured and month.after.treatment indicates how many months after treatment the stress score was measured.
In your example set, you only have four individuals with stress scores that have any rows in the treatment table (participants 1,4,21,and 25). Only one of these, participant 1, has both a pre-treatment stress measures and post-treatment stress measure...
Here is one way to produce the information you need:
inner_join(score.dt,date.treatment.dt, by="participant.index") %>%
group_by(participant.index, date.treatment) %>%
summarize(pre_treatment = min(date.recorded[date.recorded<=date.treatment]),
post_treatment = max(date.recorded[date.recorded>=date.treatment])) %>%
pivot_longer(cols = -(participant.index:date.treatment), names_to = "period", values_to = "date.recorded") %>%
left_join(score.dt, by=c("participant.index", "date.recorded" )) %>%
mutate(period=str_extract(period,".*(?=_)"),
months = abs(as.numeric(date.treatment-date.recorded))/(365.25/12)) %>%
pivot_wider(id_cols = participant.index:date.treatment, names_from = period, values_from=c(date.recorded, subscale, months,score))
Output:
participant.index date.treatment date.recorded_pre date.recorded_post subscale_pre subscale_post months_pre months_post score_pre score_post
<dbl> <date> <date> <date> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 2018-06-27 2017-06-26 2018-09-17 stress stress 12.0 2.69 10 18
2 4 2001-07-16 NA 2016-05-30 NA stress Inf 178. NA 30
3 21 2000-03-06 NA 2015-07-06 NA stress Inf 184. NA 12
4 25 2002-03-12 NA 2014-12-17 NA stress Inf 153. NA 40
Note: you will have to fix the date inputs to the two source files, like this:
# first correct, your date.treatment column, and convert to date
date.treatment.dt[, date.treatment := as.Date(str_replace_all(date.treatment," ",""), "%Y-%m-%d")]
# second, similarly fix the date column in your stress score table
score.dt[,date.recorded := as.Date(date.recorded,"%Y-%m-%d")]
It seems like there are a few parts to what you're asking. First, you need to merge the two tables together. Here I use dplyr::inner_join() which automatically detects that the candidate.index is the only column in common and merges on that while discarding records found in only one of the tables. Second, we convert to a date format for both dates to enable the calculation of elapsed months.
library(tidyverse)
library(data.table)
library(lubridate)
score.dt <- structure(list(participant.index = c(1, 1, 1, 3, 4, 4, 13, 21, 21, 25, 37, 40, 41, 41, 41, 43, 43, 43, 44), repeat.instance = c(2, 3, 6, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 3, 1, 2, 3, 1), date.recorded = c("2017-07-13", "2017-06-26", "2018-09-17", "2016-04-14", "2014-03-24", "2016-05-30", "2018-06-20", "2014-08-03", "2015-07-06", "2014-12-17", "2014-09-05", "2013-06-10", "2015-10-04", "2016-11-04", "2016-04-18", "2014-02-13", "2013-05-24", "2014-09-10", "2014-11-25"), subscale = c("stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress", "stress"), score = c(18, 10, 18, 36, 16, 30, 28, 10, 12, 40, 16, 12, 10, 14, 6, 32, 42, 26, 18)), row.names = c(NA, -19L), class = c("data.table", "data.frame"))
date.treatment.dt <- structure(list(participant.index = c(1, 4, 5, 6, 8, 10, 11, 12, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26), date.treatment = c("2018 - 06 - 27", "2001 - 07 - 16", "2009 - 12 - 09", "2009 - 05 - 20", "2009 - 07 - 22", "2008-07 - 02", "2009 - 11 - 25", "2009 - 09 - 16", "1991 - 07 - 30", "2016 - 05 - 25", "2012 - 07 - 25", "2007 - 03 - 19", "2012 - 01 - 25", "2011 - 09 - 21", "2000 - 03 - 06", "2001 - 09 - 25", "1999 - 12 - 20", "1997 -07 - 28", "2002 - 03 - 12", "2008 - 01 - 23")), row.names = c(NA, -20L), class = c("data.table", "data.frame"))
inner_join(date.treatment.dt, score.dt) %>%
mutate(across(contains("date"), as_date)) %>%
mutate(months.after = interval(date.treatment, date.recorded) %/% months(1)) %>%
mutate(months.before = 0 - months.after)
#> Joining, by = "participant.index"
#> participant.index date.treatment repeat.instance date.recorded subscale
#> 1: 1 2018-06-27 2 2017-07-13 stress
#> 2: 1 2018-06-27 3 2017-06-26 stress
#> 3: 1 2018-06-27 6 2018-09-17 stress
#> 4: 4 2001-07-16 1 2014-03-24 stress
#> 5: 4 2001-07-16 2 2016-05-30 stress
#> 6: 21 2000-03-06 1 2014-08-03 stress
#> 7: 21 2000-03-06 2 2015-07-06 stress
#> 8: 25 2002-03-12 1 2014-12-17 stress
#> score months.after months.before
#> 1: 18 -11 11
#> 2: 10 -12 12
#> 3: 18 2 -2
#> 4: 16 152 -152
#> 5: 30 178 -178
#> 6: 10 172 -172
#> 7: 12 184 -184
#> 8: 40 153 -153
Created on 2022-04-05 by the reprex package (v2.0.1)
The axis and variables are the same, but the original data frame is different
mod_a <- gamm(Response ~ s(variable1) + s(variable2) + s(variable3), data=df1)
mod_b <- gamm(Response ~ s(variable1) + s(variable2) + s(variable3), data=df2)
How do I combine them into one plot and color code them for each? so it looks something like this (picture below)? So that the plot shows both mod_a and mod_b even though they are originally from different data frames?
Sample dataset:
df1 <- data.frame (Response = c(00, 17, 03, 23, 02, 21, 24, 21, 16, 24, 15, 28, 07, 30, 11, 07, 21, 14, 10, 05, 14, 17, 02, 03, 18, 28, 05, 16, 14, 02, 18, 26, 30, 06, 11, 06, 25, 03, 20, 19, 30, 16, 24, 12, 22, 20, 23, 20, 14, 26),
variable1 = c(26, 00, 26, 03, 29, 25, 18, 24, 22, 17, 18, 15, 20, 23, 29, 17, 02, 21, 25, 05, 28, 17, 13, 03, 29, 01, 12, 06, 05, 09, 04, 17, 12, 27, 25, 14, 06, 05, 05, 06, 01, 26, 26, 08, 19, 25, 30, 29, 18, 07),
variable2 = c(08, 03, 22, 09, 10, 00, 06, 22, 23, 02, 06, 08, 19, 06, 29, 27, 14, 24, 01, 08, 15, 10, 24, 04, 27, 09, 19, 20, 16, 04, 00, 02, 26, 21, 09, 26, 29, 19, 03, 19, 30, 14, 26, 28, 28, 15, 11, 19, 08, 07),
variable3 = c(12, 07, 15, 21, 23, 19, 02, 00, 28, 27, 08, 22, 04, 18, 14, 18, 15, 20, 27, 19, 24, 07, 05, 26, 05, 28, 21, 26, 22, 30, 18, 01, 19, 05, 24, 18, 29, 15, 06, 11, 19, 13, 16, 07, 22, 08, 27, 17, 21, 25),
variable4 = c(07, 21, 24, 16, 30, 14, 27, 14, 24, 13, 28, 15, 11, 24, 19, 12, 02, 30, 19, 27, 03, 12, 23, 16, 17, 12, 04, 17, 01, 07, 29, 12, 03, 20, 04, 27, 19, 10, 18, 08, 15, 29, 11, 03, 16, 08, 11, 19, 25, 13),
variable5 = c("sq1", "sq2", "sq3", "sq4", "sq5","sq1", "sq2", "sq3", "sq4", "sq5","sq1", "sq2", "sq3", "sq4", "sq5","sq1", "sq2", "sq3", "sq4", "sq5","sq1", "sq2", "sq3", "sq4", "sq5","sq1", "sq2", "sq3", "sq4", "sq5","sq1", "sq2", "sq3", "sq4", "sq5","sq1", "sq2", "sq3", "sq4", "sq5","sq1", "sq2", "sq3", "sq4", "sq5","sq1", "sq2", "sq3", "sq4", "sq5"))
df2 <- data.frame (Response = c(24, 29, 16, 03, 01, 04, 08, 03, 17, 09, 27, 11, 28, 02, 11, 15, 26, 12, 05, 03, 06, 06, 11, 24, 19, 25, 07, 14, 29, 02, 04, 27, 15, 06, 18, 10, 30, 16, 17, 22, 07, 24, 02, 24, 17, 09, 00, 20, 06, 27),
variable1 = c(22, 11, 19, 08, 03, 16, 04, 20, 12, 25, 08, 21, 04, 07, 09, 28, 25, 04, 27, 17, 00, 22, 29, 08, 17, 06, 12, 16, 08, 00, 16, 24, 20, 09, 10, 10, 04, 24, 11, 00, 07, 21, 15, 11, 05, 00, 07, 05, 25, 03),
variable2 = c(11, 21, 01, 06, 18, 22, 10, 19, 26, 16, 12, 08, 18, 11, 25, 16, 16, 25, 02, 29, 22, 02, 01, 03, 10, 08, 16, 19, 07, 10, 05, 17, 04, 24, 20, 29, 23, 00, 01, 18, 10, 24, 15, 09, 14, 26, 30, 30, 04, 29),
variable3 = c(15, 06, 24, 29, 04, 07, 26, 14, 21, 15, 18, 02, 27, 09, 09, 24, 09, 15, 23, 15, 09, 13, 08, 07, 14, 03, 03, 07, 27, 21, 06, 30, 03, 03, 27, 11, 01, 05, 03, 14, 10, 20, 30, 10, 22, 23, 03, 30, 30, 25),
variable4 = c(03, 22, 10, 07, 23, 08, 12, 06, 25, 17, 12, 28, 21, 28, 18, 21, 15, 17, 23, 10, 11, 21, 12, 10, 26, 04, 18, 18, 26, 25, 20, 02, 15, 28, 17, 04, 14, 28, 01, 13, 16, 05, 14, 02, 06, 15, 16, 26, 29, 07),
variable5 = c("sq1", "sq2", "sq3", "sq4", "sq5","sq1", "sq2", "sq3", "sq4", "sq5","sq1", "sq2", "sq3", "sq4", "sq5","sq1", "sq2", "sq3", "sq4", "sq5","sq1", "sq2", "sq3", "sq4", "sq5","sq1", "sq2", "sq3", "sq4", "sq5","sq1", "sq2", "sq3", "sq4", "sq5","sq1", "sq2", "sq3", "sq4", "sq5","sq1", "sq2", "sq3", "sq4", "sq5","sq1", "sq2", "sq3", "sq4", "sq5"))
library(mgcv)
mod_a <- gamm(Response ~ s(variable1) + s(variable2) + s(variable3), data=df1)
mod_b <- gamm(Response ~ s(variable1) + s(variable2) + s(variable3), data=df2)
plot(mod_a$gam, pages = 1, shade = T, shade.col = 'gray', residuals = T)
plot(mod_b$gam, pages = 1, shade = T, shade.col = 'gray', residuals = T)
One option is my {gratia} package:
library('dplyr')
library('gratia')
# can't handle gamm objects just yet so extract the $gam compoents
ma <- mod_a$gam
mb <- mod_b$gam
then use compare_smooths() which has methods for gam objects
compare_smooths(ma, mb)
this returns a nested tibble
r$> compare_smooths(ma, mb)
# A tibble: 6 × 5
model smooth type by data
<chr> <chr> <chr> <chr> <list>
1 ma s(variable1) TPRS NA <tibble [100 × 3]>
2 mb s(variable1) TPRS NA <tibble [100 × 3]>
3 ma s(variable2) TPRS NA <tibble [100 × 3]>
4 mb s(variable2) TPRS NA <tibble [100 × 3]>
5 ma s(variable3) TPRS NA <tibble [100 × 3]>
6 mb s(variable3) TPRS NA <tibble [100 × 3]>
which has a draw() method:
compare_smooths(ma, mb) %>%
draw()
which produces
If you want to do it for a specific smooth use the smooths argument
r$> compare_smooths(ma, mb, smooths = "s(variable1)")
# A tibble: 2 × 5
model smooth type by data
<chr> <chr> <chr> <chr> <list>
1 ma s(variable1) TPRS NA <tibble [100 × 3]>
2 mb s(variable1) TPRS NA <tibble [100 × 3]>
I will add a method for gamm objects so you in future should be able to just do
compare_smooths(mod_a, mod_b)
I have a dataframe with different company IDs appearing from once to over 30 times in different rows. I want to add a new column "di_Flex" and fill it with specific values depending on how often the same company ID appears in a column:
If it appears twice in the column, add the value 6 to the new column "di_Flex",
if it appears 3x, add "8",
if it appears 4x add "10",
if it appears 5x add "12.8",
if it appears 6x add "14.67",
if it appears 7 or more times add "16".
Here is the dataframe:
c(0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 7, 7, 8, 9, 9, 9, 10, 10, 11, 11, 12, 12, 13, 14,
15, 16, 17, 17, 18, 18, 19, 20, 21, 22, 23, 23, 23, 24, 24, 24,
24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24,
24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 25, 25,
25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25,
25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25,
25, 25, 26, 26, 26, 26, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27,
28, 29, 30, 31, 31, 32, 32, 32, 33, 33, 33, 34, 34, 34, 35, 36,
36, 37, 38, 38, 38, 38, 38, 38, 39, 40, 41, 41, 41, 42, 42, 42,
43, 43, 43, 44, 45, 45, 46, 46, 46, 47, 48, 49, 50, 50, 51, 53,
54, 54, 54, 54, 55, 57, 57, 57, 59, 59, 59, 59, 60, 60, 60, 60,
61, 61, 62, 62, 62, 63, 63, 64, 64, 64, 64, 65, 65, 66, 66, 66,
66, 66, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA)
Thank you for your help!
Assuming your data is called df with a column value:
library(tidyverse)
left_join(df, df %>%
group_by(value) %>%
tally()) %>%
mutate(di_Flex = case_when(n == 2 ~ 6,
n == 3 ~ 8,
n == 4 ~ 10,
n == 5 ~ 12.8,
n == 6 ~ 14.67,
n >= 7 ~ 16)) %>%
select(-n)
This gives us:
1 0 12.8
2 0 12.8
3 0 12.8
4 0 12.8
5 0 12.8
6 1 NA
7 2 NA
8 3 NA
9 4 NA
10 5 8.0
11 5 8.0
12 5 8.0
13 6 16.0
14 6 16.0
15 6 16.0
16 6 16.0
17 6 16.0
18 6 16.0
19 6 16.0
20 6 16.0
Data:
df <- data.frame(value = c(0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 7, 7, 8, 9, 9, 9, 10, 10, 11, 11, 12, 12, 13, 14,
15, 16, 17, 17, 18, 18, 19, 20, 21, 22, 23, 23, 23, 24, 24, 24,
24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24,
24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 25, 25,
25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25,
25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25,
25, 25, 26, 26, 26, 26, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27,
28, 29, 30, 31, 31, 32, 32, 32, 33, 33, 33, 34, 34, 34, 35, 36,
36, 37, 38, 38, 38, 38, 38, 38, 39, 40, 41, 41, 41, 42, 42, 42,
43, 43, 43, 44, 45, 45, 46, 46, 46, 47, 48, 49, 50, 50, 51, 53,
54, 54, 54, 54, 55, 57, 57, 57, 59, 59, 59, 59, 60, 60, 60, 60,
61, 61, 62, 62, 62, 63, 63, 64, 64, 64, 64, 65, 65, 66, 66, 66,
66, 66, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA))
I am trying to make an animated bubble chart for a baseball league I'm in. Once I create the animated graph and convert it into a gif, it plots each team twice, as shown in the picture below. The legend should only hold 14 points/teams, but it shows 28 instead.
My code is the following:
library(ggplot2)
library(gganimate)
library(readxl)
library(gifski)
library(png)
myData <- read_excel("~/Desktop/Dynasty - Fantasy Baseball.xlsx")
# Make a ggplot, but add frame=year: one image per year
g <- ggplot(myData, aes(PF, PA, size = `W%`, color = Team)) +
geom_point() +
theme_bw() +
# gganimate specific bits:
labs(title = 'Period: {frame_time-1900}', x = 'Points For', y = 'Points Against') +
transition_time(Year) +
ease_aes('linear')
# Save at gif:
anim_save(filename = "~/Desktop/FantasyBaseballAnimated.gif", animation = g)
My data is stored in the following:
structure(list(Team = c("Houston Astros", "Miami Marlins", "New York Mets",
"Atlanta Braves", "St. Louis Cardinals", "Cincinatti Reds", "Philadelphia Reds",
"Baltimore Orioles", "Milwaukee Brewers", "Washington Nationals",
"Montreal Expos", "Tampa Bay Rays", "Seattle Mariners", "Brooklyn Dodgers",
"Houston Astros", "Miami Marlins", "New York Mets", "Atlanta Braves",
"St. Louis Cardinals", "Cincinatti Reds", "Philadelphia Reds",
"Baltimore Orioles", "Milwaukee Brewers", "Washington Nationals",
"Montreal Expos", "Tampa Bay Rays", "Seattle Mariners", "Brooklyn Dodgers",
"New York Mets ", "St. Louis Cardinals ", "Cincinatti Reds ",
"Washington Nationals ", "Atlanta Braves ", "Miami Marlins ",
"Philadelphia Phillies ", "Tampa Bay Rays ", "Houston Astros ",
"Montreal Expos ", "Baltimore Orioles ", "Milwaukee Brewers ",
"Seattle Mariners ", "Brooklyn Dodgers ", "St. Louis Cardinals ",
"Washington Nationals ", "Miami Marlins ", "Cincinatti Reds ",
"New York Mets ", "Atlanta Braves ", "Tampa Bay Rays ", "Houston Astros ",
"Milwaukee Brewers ", "Philadelphia Phillies ", "Baltimore Orioles ",
"Montreal Expos ", "Seattle Mariners ", "Brooklyn Dodgers ",
"Washington Nationals ", "St. Louis Cardinals ", "Atlanta Braves ",
"Cincinatti Reds ", "New York Mets ", "Houston Astros ", "Miami Marlins ",
"Philadelphia Phillies ", "Tampa Bay Rays ", "Milwaukee Brewers ",
"Baltimore Orioles ", "Montreal Expos ", "Seattle Mariners ",
"Brooklyn Dodgers ", "St. Louis Cardinals ", "Washington Nationals ",
"Philadelphia Phillies ", "Miami Marlins ", "Atlanta Braves ",
"New York Mets ", "Houston Astros ", "Milwaukee Brewers ",
"Cincinatti Reds ", "Tampa Bay Rays ", "Montreal Expos ",
"Baltimore Orioles ", "Seattle Mariners ", "Brooklyn Dodgers ",
"New York Mets ", "St. Louis Cardinals ", "Washington Nationals ",
"Philadelphia Phillies ", "Miami Marlins ", "Houston Astros ",
"Atlanta Braves ", "Milwaukee Brewers ", "Cincinatti Reds ",
"Tampa Bay Rays ", "Montreal Expos ", "Baltimore Orioles ",
"Seattle Mariners ", "Brooklyn Dodgers ", "St. Louis Cardinals ",
"Washington Nationals ", "Houston Astros ", "New York Mets ",
"Philadelphia Phillies ", "Milwaukee Brewers ", "Atlanta Braves ",
"Miami Marlins ", "Cincinatti Reds ", "Tampa Bay Rays ", "Baltimore Orioles ",
"Montreal Expos ", "Seattle Mariners ", "Brooklyn Dodgers "
), W = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 9, 8,
7, 6, 6, 5, 6, 5, 4, 3, 2, 2, 2, 17, 17, 16, 14, 14, 14, 12,
11, 13, 7, 7, 6, 3, 3, 25, 24, 22, 21, 20, 20, 18, 19, 16, 14,
12, 9, 8, 5, 33, 32, 27, 27, 25, 26, 25, 23, 21, 21, 16, 15,
11, 7, 37, 37, 35, 34, 33, 32, 32, 29, 29, 27, 21, 19, 17, 7,
44, 43, 43, 40, 38, 40, 37, 37, 35, 32, 25, 23, 20, 7, 52, 50,
50, 48, 48, 43, 42, 40, 41, 38, 34, 28, 25, 8), L = c(0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 3, 4, 6, 5, 6, 5, 6,
7, 8, 9, 10, 5, 5, 7, 7, 8, 9, 9, 9, 11, 14, 15, 15, 19, 21,
8, 9, 11, 13, 13, 13, 14, 16, 17, 19, 21, 22, 26, 31, 11, 12,
16, 19, 18, 19, 20, 22, 21, 22, 28, 28, 33, 40, 18, 18, 22, 22,
22, 22, 25, 25, 28, 27, 34, 36, 38, 52, 22, 22, 22, 28, 27, 29,
28, 28, 33, 31, 42, 42, 46, 64, 25, 27, 31, 30, 32, 33, 34, 37,
39, 37, 43, 51, 53, 75), T = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 2, 2, 2, 0, 2, 0, 2, 2, 2, 2, 1, 0, 2, 2, 1,
3, 2, 1, 3, 4, 0, 3, 2, 3, 2, 0, 3, 3, 3, 2, 3, 3, 4, 1, 3, 3,
3, 5, 2, 0, 4, 4, 5, 2, 5, 3, 3, 3, 6, 5, 4, 5, 4, 1, 5, 5, 3,
4, 5, 6, 3, 6, 3, 6, 5, 5, 5, 1, 6, 7, 7, 4, 7, 3, 7, 7, 4, 9,
5, 7, 6, 1, 7, 7, 3, 6, 4, 8, 8, 7, 4, 9, 7, 5, 6, 1), `W%` = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.833, 0.792, 0.75, 0.667,
0.583, 0.5, 0.5, 0.5, 0.5, 0.417, 0.333, 0.25, 0.208, 0.167,
0.75, 0.75, 0.688, 0.646, 0.625, 0.604, 0.562, 0.542, 0.542,
0.354, 0.333, 0.312, 0.167, 0.125, 0.736, 0.708, 0.653, 0.611,
0.597, 0.597, 0.556, 0.542, 0.486, 0.431, 0.375, 0.319, 0.25,
0.139, 0.729, 0.708, 0.615, 0.583, 0.573, 0.573, 0.552, 0.51,
0.5, 0.49, 0.375, 0.365, 0.271, 0.156, 0.658, 0.658, 0.608, 0.6,
0.592, 0.583, 0.558, 0.533, 0.508, 0.5, 0.392, 0.358, 0.325,
0.125, 0.653, 0.646, 0.646, 0.583, 0.576, 0.576, 0.562, 0.562,
0.514, 0.507, 0.382, 0.368, 0.319, 0.104, 0.661, 0.637, 0.613,
0.607, 0.595, 0.56, 0.548, 0.518, 0.512, 0.506, 0.446, 0.363,
0.333, 0.101), `Div Rec` = c("0", "0", "0", "0", "0", "0", "0",
"0", "0", "0", "0", "0", "0", "0", "0-0-0", "0-0-0", "37470",
"0-0-0", "0-0-0", "36683", "0-0-0", "36683", "0-0-0", "0-0-0",
"0-0-0", "37295", "0-0-0", "0-0-0", "17-5-2", "0-0-0", "36683",
"0-0-0", "36712", "36653", "0-0-0", "37295", "36594", "0-0-0",
"36683", "0-0-0", "0-0-0", "0-0-0", "37106", "36801", "36653",
"37207", "20-13-3", "13-10-1", "37512", "36594", "0-0-0", "36566",
"36683", "0-0-0", "36653", "0-0-0", "19-4-1", "37106", "13-10-1",
"37207", "25-18-5", "37541", "36754", "36843", "37512", "37381",
"36683", "0-0-0", "37482", "36931", "13-9-2", "19-4-1", "23-13-0",
"17-18-1", "13-10-1", "25-18-5", "37541", "37381", "13-21-2",
"15-19-2", "36683", "36683", "14-19-3", "36943", "25-18-5", "13-9-2",
"25-8-3", "28-19-1", "17-18-1", "18-16-2", "13-10-1", "13-8-3",
"19-26-3", "15-19-2", "36813", "37541", "17-27-4", "36943", "22-12-2",
"25-8-3", "18-16-2", "25-18-5", "28-19-1", "13-8-3", "13-10-1",
"17-18-1", "19-26-3", "15-19-2", "21-13-2", "13-23-0", "17-27-4",
"3-32-1"), GB = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0.5, 1, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7.5, 8, 0, 0, 1.5, 2.5, 3,
3.5, 4.5, 5, 5, 9.5, 10, 10.5, 14, 15, 0, 1, 3, 4.5, 5, 5, 6.5,
7, 9, 11, 13, 15, 17.5, 21.5, 0, 1, 5.5, 7, 7.5, 7.5, 8.5, 10.5,
11, 11.5, 17, 17.5, 22, 27.5, 0, 0, 3, 3.5, 4, 4.5, 6, 7.5, 9,
9.5, 16, 18, 20, 32, 0, 0.5, 0.5, 5, 5.5, 5.5, 6.5, 6.5, 10,
10.5, 19.5, 20.5, 24, 39.5, 0, 2, 4, 4.5, 5.5, 8.5, 9.5, 12,
12.5, 13, 18, 25, 27.5, 47), PF = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 10, 9.5, 9, 8, 7, 6, 6, 6, 6, 5, 4, 3, 2.5, 2,
18, 18, 16.5, 15.5, 15, 14.5, 13.5, 13, 13, 8.5, 8, 7.5, 4, 3,
26.5, 25.5, 23.5, 22, 21.5, 21.5, 20, 19.5, 17.5, 15.5, 13.5,
11.5, 9, 5, 35, 34, 29.5, 28, 27.5, 27.5, 26.5, 24.5, 24, 23.5,
18, 17.5, 13, 7.5, 39.5, 39.5, 36.5, 36, 35.5, 35, 33.5, 32,
30.5, 30, 23.5, 21.5, 19.5, 7.5, 47, 46.5, 46.5, 42, 41.5, 41.5,
40.5, 40.5, 37, 36.5, 27.5, 26.5, 23, 7.5, 55.5, 53.5, 51.5,
51, 50, 47, 46, 43.5, 43, 42.5, 37.5, 30.5, 28, 8.5), PA = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2.5, 3, 4, 5, 6, 6,
6, 6, 7, 8, 9, 9.5, 10, 6, 6, 7.5, 8.5, 9, 9.5, 10.5, 11, 11,
15.5, 16, 16.5, 20, 21, 9.5, 10.5, 12.5, 14, 14.5, 14.5, 16,
16.5, 18.5, 20.5, 22.5, 24.5, 27, 31, 13, 14, 18.5, 20, 20.5,
20.5, 21.5, 23.5, 24, 24.5, 30, 30.5, 35, 40.5, 20.5, 20.5, 23.5,
24, 24.5, 25, 26.5, 28, 29.5, 30, 36.5, 38.5, 40.5, 52.5, 25,
25.5, 25.5, 30, 30.5, 30.5, 31.5, 31.5, 35, 35.5, 44.5, 45.5,
49, 64.5, 28.5, 30.5, 32.5, 33, 34, 37, 38, 40.5, 41, 41.5, 46.5,
53.5, 56, 75.5), Period = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7), Place = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1, 2,
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), Year = c(1900,
1900, 1900, 1900, 1900, 1900, 1900, 1900, 1900, 1900, 1900, 1900,
1900, 1900, 1901, 1901, 1901, 1901, 1901, 1901, 1901, 1901, 1901,
1901, 1901, 1901, 1901, 1901, 1902, 1902, 1902, 1902, 1902, 1902,
1902, 1902, 1902, 1902, 1902, 1902, 1902, 1902, 1903, 1903, 1903,
1903, 1903, 1903, 1903, 1903, 1903, 1903, 1903, 1903, 1903, 1903,
1904, 1904, 1904, 1904, 1904, 1904, 1904, 1904, 1904, 1904, 1904,
1904, 1904, 1904, 1905, 1905, 1905, 1905, 1905, 1905, 1905, 1905,
1905, 1905, 1905, 1905, 1905, 1905, 1906, 1906, 1906, 1906, 1906,
1906, 1906, 1906, 1906, 1906, 1906, 1906, 1906, 1906, 1907, 1907,
1907, 1907, 1907, 1907, 1907, 1907, 1907, 1907, 1907, 1907, 1907,
1907)), row.names = c(NA, -112L), class = c("tbl_df", "tbl",
"data.frame"))
I thought factoring it would work, and also parsing it but neither worked:
#first thought
myData$Team <- factor(myData$Team)
summary(myData)
#second thought
myData$Team <- eval(parse(text = myData$Team))
Am I just missing something obvious? I'm drawing a blank at how I could fix this. Any help would be greatly appreciated!
It looks like you need to do some data cleaning:
data %>% group_by(Team) %>%
summarise(count = n())
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 28 x 2
Team count
<chr> <int>
1 "Atlanta Braves" 2
2 "Atlanta Braves " 6
3 "Baltimore Orioles" 2
4 "Baltimore Orioles " 6
5 "Brooklyn Dodgers" 2
6 "Brooklyn Dodgers " 6
7 "Cincinatti Reds" 2
8 "Cincinatti Reds " 6
9 "Houston Astros" 2
10 "Houston Astros " 6
# ... with 18 more rows
Using stringr:
data <- data %>%
mutate(Team = str_trim(Team, side = c("both")))
Answer
Remove the whitespace around the names:
myData$Team <- trimws(myData$Team)
Rationale
You actually have each team in there twice. Half just contain a single space at the end of their name. You may want to look into WHY that is happening.
table(myData$Team, myData$Year)[1:2, ]
# 1900 1901 1902 1903 1904 1905 1906 1907
# Atlanta Braves 1 1 0 0 0 0 0 0
# Atlanta Braves 0 0 1 1 1 1 1 1
sort(unique(myData$Team))[1:2]
#[1] "Atlanta Braves" "Atlanta Braves "
I have a file foo.txt that looks like this:
7, 3, 5, 7, 3, 3, 3, 3, 3, 3, 3, 6, 7, 5, 5, 22, 18, 14, 23, 16, 18, 5, 13, 34, 24, 17, 50, 30, 42, 35, 29, 27, 52, 35, 44, 52, 36, 39, 25, 40, 50, 52, 40, 2, 52, 52, 31, 35, 30, 19, 32, 46, 50, 43, 36, 15, 21, 16, 36, 25, 7, 3, 5, 7, 3, 3, 3, 3, 3, 3, 3, 6
I want to read the numbers in sets of 15, moving to the right one number at the time:
7, 3, 5, 7, 3, 3, 3, 3, 3, 3, 3, 6, 7, 5, 5
then
3, 5, 7, 3, 3, 3, 3, 3, 3, 3, 6, 7, 5, 5, 22
and so on.
If 7 or more of those 15 numbers are =>10 then keep them in a growing object that ends when the condition isn't met. So the first one to keep would be
3, 3, 3, 6, 7, 5, 5, 22, 18, 14, 23, 16, 18, 5, 13
because 7 out of those 15 numbers are => 10 (those numbers are 22, 18, 14, 23, 16, 18 and 13
The output file would look like this:
3, 3, 3, 6, 7, 5, 5, 22, 18, 14, 23, 16, 18, 5, 13, 34, 24, 17, 50, 30, 42, 35, 29, 27, 52, 35, 44, 52, 36, 39, 25, 40, 50, 52, 40, 2, 52, 52, 31, 35, 30, 19, 32, 46, 50, 43, 36, 15, 21, 16, 36, 25, 7, 3, 5, 7, 3, 3, 3, 3
So far I'm stuck at getting sets of 15 digits but I don't know how to make the condition "7 or more must be => 10"
qual <- readLines("foo.txt", 1)
separados <- unlist(strsplit(qual, ", "))
for (i in 1:length(qual)) {
separados[(i):(i + 14)] -> numbers
I don't mind the language as long as it does the work
I've added two ='s to Vlo's solutions and made this for you. Does this answer your question?
foo.txt <- c(7, 3, 5, 7, 3, 3, 3, 3, 3, 3, 3, 6, 7, 5, 5, 22, 18, 14, 23, 16, 18, 5,
13, 34, 24, 17, 50, 30, 42, 35, 29, 27, 52, 35, 44, 52, 36, 39, 25, 40,
50, 52, 40, 2, 52, 52, 31, 35, 30, 19, 32, 46, 50, 43, 36, 15, 21, 16,
36, 25, 7, 3, 5, 7, 3, 3, 3, 3, 3, 3, 3, 6)
# install.packages(c("zoo"), dependencies = TRUE)
require(zoo)
bar <- rollapply(foo.txt, 15, function(x) sum(x >= 10 ) >= 7)
(product <- foo.txt[bar])
[1] 3 3 3 6 7 5 5 22 18 14 23 16 18 5 13 34 24 17 50 30 42 35 29 27
[25] 52 35 44 52 36 39 25 40 50 52 40 2 52 52 31 35 30 19 32 46 50 43 3 3
[49] 3 3 3 6
I would do it in Python (you said you don't mind the language):
array = []
with open("foo.txt","r") as f:
for line in f:
for num in line.strip().split(', '):
array.append(int(num))
result = []
growing = False
while len(array) >= 15:
if sum(1 for e in filter(lambda x: x>=10, array[:15])) >= 7:
if growing:
result.append(array[15])
else:
result.extend(array[:15])
growing = True
else:
growing = False
del(array[0])
print(str(result)[1:-1])
Short explanation: first while simply reads the lines in the file, strips end of line, separates every number between ", " characters and appends each number to array.
Second while checks the first 15 numbers in array; if they have at least 7 numbers >= 0, it appends all the numbers, or just the last one (depending if the last iteration), to result. At the end of the loop, it removes the first number in array so that the loop can continue with the next 15 numbers.