Adding data labels to points in ggplot: label argument not working - r

I know that this question has been answered a few times here and here. The key point seems to be to include the argument label in aes. But, for me ggplot does not accept label as an argument in aes. I tried using the generic function labels in aes as below, but that didn't work to create labels for points, though I am able to generate a graph:
launch_curve <- ggplot(data = saltsnck_2002_plot_t,aes(x=weeks,y=markets, labels(c(1,2,3,4,5,6,7,8,9,10,11,12))))+
geom_line()+geom_point()+
scale_x_continuous(breaks = seq(0,12,by=1))+
scale_y_continuous(limits=c(0,50), breaks = seq(0,50,by=5))+
xlab("Weeks since launch")+ ylab("No. of markets")+
ggtitle(paste0(marker1,marker2))+
theme(plot.title = element_text(size = 10))
print(launch_curve)
Does anyone know a way around this? I am using R version 3.4.3.
Edited to include sample data:
The data that I use to plot is in the dataframe saltsnck_2002_plot_t. (12 rows by 94 cols). A sample is given below:
>saltsnck_2002_plot_t
11410008398 11600028960 11819570760 11819570761 12325461033 12325461035 12325461037
Week1 3 2 2 1 2 2 1
Week2 6 16 10 1 3 2 2
Week3 11 41 13 10 3 3 2
Week4 15 46 14 14 3 4 4
Week5 15 48 15 14 3 4 4
Week6 27 48 15 15 3 4 4
Week7 31 50 15 15 3 4 5
Week8 33 50 16 16 5 5 6
Week9 34 50 18 16 5 5 6
Week10 34 50 21 19 5 5 6
Week11 34 50 23 21 5 5 6
Week12 34 50 24 23 5 5 6
I am actually plotting graphs in a loop by moving through the columns of the dataframe. This dataframe is the result of a transpose operation, hence the weird row and column names. The graph for the first column looks like the one below. And a correction from my earlier post, I need to capture as data labels the values in the column and not c(1,2,3,4,5,6,7,8,9,10,11,12).

Use geom_text
library(ggplot2)
ggplot(data = df,aes(x=weeks_num,y=markets))+
geom_line() + geom_point() + geom_text(aes(label = weeks), hjust = 0.5, vjust = -1) +
scale_y_continuous(limits=c(0,50), breaks = seq(0,50,by=5)) +
scale_x_continuous(breaks = seq(1,12,by=1),labels = weeks_num)+
xlab("Weeks since launch")+ ylab("No. of markets")+
ggtitle(paste0(markets))+
theme(plot.title = element_text(size = 10))
Data
df <- structure(list(weeks_num = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12), weeks = structure(1:12, .Label = c("week1", "week2", "week3",
"week4", "week5", "week6", "week7", "week8", "week9", "week10",
"week11", "week12"), class = c("ordered", "factor")), markets = c(3,
6, 11, 15, 27, 31, 33, 34, 34, 34, 34, 34)), .Names = c("weeks_num",
"weeks", "markets"), row.names = c(NA, -12L), class = "data.frame")

Related

R : regression line interrupted in ggplot while a continuous line is expected

I created a multilevel regression model with nlme package and now I would like to illustrate the regression line obtained for some patients (unfortunately I cannot use geom_smooth with nlme).
So using the model I obtained the following predicted values (predicted_value) at different times (date_day) and here for two patients (ID1 and ID2).
df <- data.frame (ID = c (rep (1, 10), rep(2, 10)),
date_day = c (7:16, 7:16),
predicted_value = c (33, 33, 33, 33, 33, NA, 34, NA, NA, NA,
55, NA, NA, 53.3, NA, NA, 51.6, NA, 50.5, NA))
ID date_day predicted_value
1 1 7 33.0
2 1 8 33.0
3 1 9 33.0
4 1 10 33.0
5 1 11 33.0
6 1 12 NA
7 1 13 34.0
8 1 14 NA
9 1 15 NA
10 1 16 NA
11 2 7 55.0
12 2 8 NA
13 2 9 NA
14 2 10 53.3
15 2 11 NA
16 2 12 NA
17 2 13 51.6
18 2 14 NA
19 2 15 50.5
20 2 16 NA
Now I would like to draw the regression line for each of these patients. So I tried the following
ggplot(df%>% filter(ID %in% c("1", "2")))+
aes(x = date_day, y = predicted_value) +
geom_point(shape = "circle", size = 1.5, colour = "#112446", na.rm = T) +
geom_line(aes(y = predicted_value), na.rm = T, size = 1) +
theme_minimal() +
facet_wrap(vars(ID)) +
scale_x_continuous(name="days", limits=c(7, 16)) +
scale_y_continuous(name="predicted values", limits=c(0, 60))
But I end with the following plots: patient 1 : the line is interrupted, and patient 2 no line at all. How can I fix that ?
Thanks a lot
Thank you #BenBolker , indeed changing the first line
ggplot(df%>% filter(ID %in% c("1", "2")))
to
ggplot(na.omit(df)%>% filter(ID %in% c("1", "2")))
allowed to solve the job

With R, how do i assign values to a new column based on numbers that fall within a range?

i have a column called reported age that can range from 0 to 100.
Report age|
5
82
17
39
67
I would like to create a script that assigns a new column called Age Group
Report age|Age Group|
5 5 to 9
82 80 to 84
17 15 to 19
39 35 to 39
67 64 to 69
I know if i have
df <-df %>%
mutate(Age_Group = ifelse(`Report age` <5, "Under 5", No)
I will get two outcomes. I want to set up way more. Under 5, 5 to 9, 10 to 14, 15 to 19, and so on until "85 years and over".
We can use cut to create the group
library(dplyr)
brks <- c(5, 9, 15, 35, 39, 64, 69, 80, 84)
df %>%
mutate(Age_Group = cut(`Report age`,
breaks = c(-Inf, brks, Inf),
labels = c("under 5", paste(head(brks, -1),
" to ", tail(brks, -1)), "85 years and over")))

Merging rows with same value with conditions for keeping multiple dummies

Creating a subset example of the DF (the code for a part of the actual one is at the end)
ANO_CENSO PK_COD_TURMA PK_COD_ENTIDADE MAIS_ENSINO_FUND MAIS_ENSINO_MED ENSINO_INTEG_FUND ENSINO_INTEG_MED
2011 27 12 1 0 0 1
2011 41 12 1 1 0 0
2011 18 13 0 0 0 1
2011 16 14 1 1 0 1
I want to merge the rows with the same value for PK_COD_ENTIDADE into a single one, and keep the values "1" for the dummies with the same PK_COD_ENTIDADE. I don't care for the different values in PK_COD_TURMA, doesn't matter which one stays at the final DF (27 or 41).
MY DF have multiple variables like PK_COD_TURMA that I don't care for the final value, the important one are the PK_COD_ENTIDADE and the dummies with value "1"
It would look like this at the end:
ANO_CENSO PK_COD_TURMA PK_COD_ENTIDADE MAIS_ENSINO_FUND MAIS_ENSINO_MED ENSINO_INTEG_FUND ENSINO_INTEG_MED
2011 27 12 1 1 0 1
2011 18 13 0 0 0 1
2011 16 14 1 1 0 1
Look at how I have the values "1" for 2 dummies in the first observation of PK_COD_ENTIDADE = 12 and another value "1" in another dummy with the PK_COD_ENTIDADE = 12, and at the end they merged in a single observation for the same PK_COD_ENTIDADE keeping the different dummies "1" (and the same dummies with 1 for different observations don't sum to 2, because they are dummies)
I have no idea how to do this, I searched for some solutions with dplyr but couldn't apply anything close to working...
Here is the structure of the df with all variables:
dftest2 <- structure(list(ANO_CENSO = c(2011, 2011, 2011, 2011), PK_COD_TURMA = c(27,
41, 18, 16), NU_DURACAO_TURMA = c(250, 255, 255,
255), FK_COD_ETAPA_ENSINO = c(41, 19, 19, 19), PK_COD_ENTIDADE = c(12,
12, 13, 14), FK_COD_ESTADO = c(11, 11, 11,
11), SIGLA = c("RO", "RO", "RO", "RO"), FK_COD_MUNICIPIO = c(1100023,
1100023, 1100023, 1100023), ID_LOCALIZACAO = c(1, 1, 1, 1), ID_DEPENDENCIA_ADM = c(2,
2, 2, 2), MAIS_ENSINO_FUND = c(1, 1, 0, 1), MAIS_ENSINO_MED = c(0,
1, 0, 1), ENSINO_INTEG_FUND = c(0L, 0L, 0L, 0L), ENSINO_INTEG_MED = c(1L,
0L, 1L, 1L)), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"))
The sample data you give for dftest2 does not match the data you present at the beginning of your post.
In response to your question, an option is to use aggregate
aggregate(
. ~ PK_COD_ENTIDADE,
data = transform(dftest2, SIGLA = as.factor(SIGLA)),
FUN = max)
#P K_COD_ENTIDADE ANO_CENSO PK_COD_TURMA NU_DURACAO_TURMA FK_COD_ETAPA_ENSINO
#1 12 2011 41 255 41
#2 13 2011 18 255 19
#3 14 2011 16 255 19
# FK_COD_ESTADO SIGLA FK_COD_MUNICIPIO ID_LOCALIZACAO ID_DEPENDENCIA_ADM
#1 11 1 1100023 1 2
#2 11 1 1100023 1 2
#3 11 1 1100023 1 2
# MAIS_ENSINO_FUND MAIS_ENSINO_MED ENSINO_INTEG_FUND ENSINO_INTEG_MED
#1 1 1 0 1
#2 0 0 0 1
#3 1 1 0 1
Explanation: We first convert the character column SIGLA to a factor; then we aggregate data in all columns (except PK_COD_ENTIDADE) by PK_COD_ENTIDADE, and return the max value (which should be consistent with your problem statement).
You can do something similar using dplyrs group_by and summarise_all
library(dplyr)
dftest2 %>%
group_by(PK_COD_ENTIDADE) %>%
summarise_all(~ifelse(is.character(.x), last(.x), max(.x))) %>%
ungroup()
# A tibble: 3 x 14
PK_COD_ENTIDADE ANO_CENSO PK_COD_TURMA NU_DURACAO_TURMA FK_COD_ETAPA_EN…
<dbl> <dbl> <dbl> <dbl> <dbl>
1 12 2011 41 255 41
2 13 2011 18 255 19
3 14 2011 16 255 19
# … with 9 more variables: FK_COD_ESTADO <dbl>, SIGLA <chr>,
# FK_COD_MUNICIPIO <dbl>, ID_LOCALIZACAO <dbl>, ID_DEPENDENCIA_ADM <dbl>,
# MAIS_ENSINO_FUND <dbl>, MAIS_ENSINO_MED <dbl>, ENSINO_INTEG_FUND <int>,
# ENSINO_INTEG_MED <int>

Need to create a variable based on the equality of other variables

I have a dataset called CSES (Comparative Study of Electoral Systems) where each row corresponds to an individual (one interview in a public opinion survey), from many countries, in many different years .
I need to create a variable which identifies the ideology of the party each person voted, as perceived by this same person.
However, the dataset identifies this perceived ideology of each party (as many other variables) by letters A, B, C, etc. Then, when it comes to identify WHICH PARTY each person voted for, it has a UNIQUE CODE NUMBER, that does not correspond to these letters across different years (i.e., the same party can have a different letter in different years – and, of course, it is never the same party across different countries, since each country has its own political parties).
Fictitious data to help clarify, reproduce and create a code:
Let’s say:
country = c(1,1,1,1,2,2,2,2,3,3,3,3)
year = c (2000,2000,2004,2004, 2002,2002,2004,2008,2000,2000,2000,2000)
party_A_number = c(11,11,12,12,21,21,22,23,31,31,31,31)
party_B_number = c(12, 12, 11, 11, 22,22,21,22,32,32,32,32)
party_C_number = c(13,13,13,13,23,23,23,21,33,33,33,33)
party_voted = c(12,13,12,11,21,24,23,22,31,32,33,31)
ideology_party_A <- floor(runif (12, min=1, max=10))
ideology_party_B <- floor(runif (12, min=1, max=10))
ideology_party_C <- floor(runif (12, min=1, max=10))
Let’s call the variable I want to create “ideology_voted”:
I need something like:
IF party_A_number == party_voted THEN ideology_voted = ideology_party_A
IF party_B_number == party_voted, THEN ideology_voted == ideology_party_B
IF party_C_number == party_voted, THEN ideology_voted == ideology_party_C
The real dataset has 9 letters for (up to) 9 main parties in each country , dozens of countries and election-years. Therefore, it would be great to have a code where I could iterate through letters A-I instead of “if voted party A, then …; if voted party B then….”
Nevertheless, I am having trouble even when I try longer, repetitive codes (one transformation for each party letter - which would give me 8 lines of code)
library(tidyverse)
df <- tibble(
country = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
year = c(2000, 2000, 2004, 2004, 2002, 2002, 2004, 2008, 2000, 2000, 2000, 2000),
party_A_number = c(11, 11, 12, 12, 21, 21, 22, 23, 31, 31, 31, 31),
party_B_number = c(12, 12, 11, 11, 22, 22, 21, 22, 32, 32, 32, 32),
party_C_number = c(13, 13, 13, 13, 23, 23, 23, 21, 33, 33, 33, 33),
party_voted = c(12, 13, 12, 11, 21, 24, 23, 22, 31, 32, 33, 31),
ideology_party_A = floor(runif (12, min = 1, max = 10)),
ideology_party_B = floor(runif (12, min = 1, max = 10)),
ideology_party_C = floor(runif (12, min = 1, max = 10))
)
> df
# A tibble: 12 x 9
country year party_A_number party_B_number party_C_number party_voted ideology_party_A ideology_party_B
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2000 11 12 13 12 9 3
2 1 2000 11 12 13 13 2 6
3 1 2004 12 11 13 12 3 8
4 1 2004 12 11 13 11 7 8
5 2 2002 21 22 23 21 2 7
6 2 2002 21 22 23 24 8 2
7 2 2004 22 21 23 23 1 7
8 2 2008 23 22 21 22 7 7
9 3 2000 31 32 33 31 4 3
10 3 2000 31 32 33 32 7 5
11 3 2000 31 32 33 33 1 6
12 3 2000 31 32 33 31 2 1
# ... with 1 more variable: ideology_party_C <dbl>
It seems you're after conditioning using case_when:
ideology_voted <- df %>% transmute(
ideology_voted = case_when(
party_A_number == party_voted ~ ideology_party_A,
party_B_number == party_voted ~ ideology_party_B,
party_C_number == party_voted ~ ideology_party_C,
TRUE ~ party_voted
)
)
> ideology_voted
# A tibble: 12 x 1
ideology_voted
<dbl>
1 3
2 7
3 3
4 8
5 2
6 24
7 8
8 7
9 4
10 5
11 6
12 2
Note that the evaluation of case_when is lazy, so the first true condition is used (if it happens that more than one is actually true, say).

transform matrix such that a factor becomes rowname

I have the following data - it is a dump from a normalized database, but I can not access the database, and the database maintainer insists that this is not necessary.
The obs variable is the unique observation id, a.k.a. the one to "pivot" around
Specifically, I want to go from this olddata to the newdata data frame below:
> olddata
species obs variable value
3 ADFA 1 mean 4
4 ADFA 1 lat 118
5 ADFA 1 lon 49
6 ADFA 1 masl 74
96 HODO 8 mean 18
97 HODO 8 lat 120
98 HODO 8 lon 45
99 HODO 8 masl 36
189 HODO 9 mean 34
190 HODO 9 lat 126
191 HODO 9 lon 12
192 HODO 9 masl 35
I would like to reshape this data frame to look like:
> newdata
species obs mean lat lon masl
1 ADFA 1 4 118 49 74
2 HODO 8 18 120 45 36
3 HODO 9 34 126 12 35
Disclaimer: this has likely been asked before but I am unable to find the question among the many questions related to transforming data frames / matrices
Here are the dataframes for use when reproducing this issue:
olddata <- structure(list(species = c("ADFA", "ADFA", "ADFA", "ADFA", "HODO",
"HODO", "HODO", "HODO", "HODO", "HODO", "HODO", "HODO"), obs = c(1,
1, 1, 1, 8, 8, 8, 8, 9, 9, 9, 9), variable = c("mean", "lat",
"lon", "masl", "mean", "lat", "lon", "masl", "mean", "lat", "lon",
"masl"), value = c(4, 118, 49, 74, 18, 120, 45, 36, 34, 126,
12, 35)), .Names = c("species", "obs", "variable", "value"),
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
"11", "12"), class = "data.frame")
newdata <- structure(list(species = c("ADFA", "HODO", "HODO"), obs = c(1,
8, 9), mean = c(4, 18, 34), lat = c(118, 120, 126), lon = c(49,
45, 12), masl = c(74, 36, 35)), .Names = c("species", "obs",
"mean", "lat", "lon", "masl"), row.names = c(NA, -3L),
class = "data.frame")
Here is an example:
> library(reshape2)
> dcast(olddata, species+obs~variable)
species obs lat lon masl mean
1 ADFA 1 118 49 74 4
2 HODO 8 120 45 36 18
3 HODO 9 126 12 35 34
library(reshape2)
dcast(olddata,species+obs~variable)

Resources