how to remove Some NA with respect of 2 groups [duplicate] - r

This question already has an answer here:
R remove groups with only NAs
(1 answer)
Closed 3 years ago.
suppose I have
HH PP mode
1 1 2
1 1 NA
1 1 NA
1 2 2
1 2 2
1 3 NA
1 3 NA
2 1 2
2 1 NA
2 2 NA
2 2 NA
first column is household index and second is persons in each household. I want to remove rows whose are NA in mode for each person in each household.for example in the first household mode column for third person is all NA so I want to remove it. same for second person in second family
output:
HH PP mode
1 1 2
1 1 NA
1 1 NA
1 2 2
1 2 2
2 1 2
2 1 NA

library(data.table)
dt[, .SD[ ( !all( is.na( mode ) ) ) ], by= .( HH, PP ) ][]
HH PP mode
1: 1 1 2
2: 1 1 NA
3: 1 1 NA
4: 1 2 2
5: 1 2 2
6: 2 1 2
7: 2 1 NA
sample data
dt <- fread(" HH PP mode
1 1 2
1 1 NA
1 1 NA
1 2 2
1 2 2
1 3 NA
1 3 NA
2 1 2
2 1 NA
2 2 NA
2 2 NA")

Related

How to make the next number in a column a sequence in r

sorry to bother everyone. I have been stuck with coding
Student Number
1 NA
1 NA
1 1
1 1
2 NA
2 1
2 1
2 1
3 NA
3 NA
3 1
3 1
I tried using dplyr to cluster by students try to find a way so that every time it reads that 1, it adds it to the following column so it would read as
Student Number
1 NA
1 NA
1 1
1 2
2 NA
2 1
2 2
2 3
3 NA
3 NA
3 1
3 2
etc
Thank you! It'd help with attendance.
data.table solution;
library(data.table)
setDT(df)
df[!is.na(Number),Number:=cumsum(Number),by=Student]
df
Student Number
<int> <int>
1 1 NA
2 1 NA
3 1 1
4 1 2
5 2 NA
6 2 1
7 2 2
8 2 3
9 3 NA
10 3 NA
11 3 1
12 3 2
Try using cumsum, note that cumsum itself cannot ignore NA
library(dplyr)
df %>%
group_by(Student) %>%
mutate(n = cumsum(ifelse(is.na(Number), 0, Number)) + 0 * Number)
Student Number n
<int> <int> <dbl>
1 1 NA NA
2 1 NA NA
3 1 1 1
4 1 1 2
5 2 NA NA
6 2 1 1
7 2 1 2
8 2 1 3
9 3 NA NA
10 3 NA NA
11 3 1 1
12 3 1 2

R Insert Value within Dataframe

I have a very complex problem, i hope someone can help -> i want to copy a row value (i.e. Player 1 or Player 2) into two other rows (for Player 3 and 4) if and only if these players are in the same Treatment, Group and Period AND this player was indeed picked (see column Player.Picked)
I know that with tidyverse I can group_by my columns of interest: Treatment, Group, and Period.
However, I am unsure how to proceed with the condition that Player Picked is fulfilled and then how to extract this value appropriately for the players 3 and 4 in the same treatment, group, period.
The column "extracted.Player 1/2 Value" should be the output. (I have manually provided the first four correct solutions).
Any ideas? Help would be very much appreciated. Thanks a lot in advance!
df
T Player Group Player.Picked Period Player1/2Value extracted.Player1/2Value
1 1 6 1 1 10
1 2 6 1 1 9
1 3 5 2 1 NA -> 4
1 4 6 1 1 NA -> 10
1 5 3 1 1 NA
1 1 5 2 1 8
1 2 1 0 1 7
1 3 6 1 1 NA -> 10
1 4 2 2 1 NA
1 5 2 2 1 NA
1 1 1 0 1 7
1 2 2 2 1 11
1 3 3 1 1 NA
1 4 4 1 1 NA
1 5 4 1 1 NA
1 1 2 2 1 21
1 2 4 1 1 17
1 3 1 0 1 NA
1 4 5 2 1 NA -> 4
1 5 6 1 1 NA
1 1 3 1 1 12
1 2 3 1 1 15
1 3 4 1 1 NA
1 4 1 0 1 NA
1 5 1 0 1 NA
1 1 4 1 1 11
1 2 5 2 1 4
1 3 2 2 1 NA
1 4 3 1 1 NA
1 5 5 2 1 NA
I'm not sure if I understood the required logic; here I'm assuming that Player 5 always picks Player 1 or 2 per Group.
So, here is my go at this using library(data.table):
library(data.table)
DT <- data.table::data.table(
check.names = FALSE,
T = c(1L,1L,1L,
1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,
1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,
1L,1L,1L,1L),
Player = c(1L,2L,3L,
4L,5L,1L,2L,3L,4L,5L,1L,2L,3L,4L,5L,
1L,2L,3L,4L,5L,1L,2L,3L,4L,5L,1L,
2L,3L,4L,5L),
Group = c(6L,6L,5L,
6L,3L,5L,1L,6L,2L,2L,1L,2L,3L,4L,4L,
2L,4L,1L,5L,6L,3L,3L,4L,1L,1L,4L,
5L,2L,3L,5L),
Player.Picked = c(1L,1L,2L,
1L,1L,2L,0L,1L,2L,2L,0L,2L,1L,1L,1L,
2L,1L,0L,2L,1L,1L,1L,1L,1L,0L,0L,
1L,2L,2L,2L),
Period = c(1L,1L,1L,
1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,
1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,
1L,1L,1L,1L),
`Player1/2Value` = c(10L,9L,NA,
NA,NA,8L,7L,NA,NA,NA,7L,11L,NA,NA,
NA,21L,17L,NA,NA,NA,12L,15L,NA,NA,NA,
11L,4L,NA,NA,NA),
`extracted.Player1/2Value` = c(NA,NA,4L,
10L,NA,NA,NA,10L,NA,NA,NA,NA,NA,NA,
NA,NA,NA,NA,4L,NA,NA,NA,NA,NA,NA,NA,
NA,NA,NA,NA)
)
setorderv(DT, cols = c("T", "Group", "Period", "Player"))
Player5PickedDT <- DT[Player == 5, Player.Picked, by = c("T", "Group", "Period")]
setnames(Player5PickedDT, old = "Player.Picked", new = "Player5Picked")
DT <- DT[Player5PickedDT, on = c("T", "Group", "Period")]
extractedDT <- DT[Player == Player5Picked & Player5Picked > 0, `Player1/2Value`, by = c("T", "Group", "Period")]
setnames(extractedDT, old = "Player1/2Value", new = "extractedValue")
DT[, "Player5Picked" := NULL]
DT <- extractedDT[DT, on = c("T", "Group", "Period")]
DT[, extractedValue := fifelse(Player %in% c(3, 4), yes = extractedValue, no = NA_real_)]
setcolorder(DT, c("T", "Group", "Period", "Player", "Player.Picked", "Player1/2Value", "extracted.Player1/2Value", "extractedValue"))
DT
The resulting table differs from your expected result (extracted.Player1/2Value vs extractedValue, but in my eyes it is following the explained logic):
T Group Period Player Player.Picked Player1/2Value extracted.Player1/2Value extractedValue
1: 1 1 1 1 0 7 NA NA
2: 1 1 1 2 0 7 NA NA
3: 1 1 1 3 0 NA NA NA
4: 1 1 1 4 1 NA NA NA
5: 1 1 1 5 0 NA NA NA
6: 1 2 1 1 2 21 NA NA
7: 1 2 1 2 2 11 NA NA
8: 1 2 1 3 2 NA NA 11
9: 1 2 1 4 2 NA NA 11
10: 1 2 1 5 2 NA NA NA
11: 1 3 1 1 1 12 NA NA
12: 1 3 1 2 1 15 NA NA
13: 1 3 1 3 1 NA NA 12
14: 1 3 1 4 2 NA NA 12
15: 1 3 1 5 1 NA NA NA
16: 1 4 1 1 0 11 NA NA
17: 1 4 1 2 1 17 NA NA
18: 1 4 1 3 1 NA NA 11
19: 1 4 1 4 1 NA NA 11
20: 1 4 1 5 1 NA NA NA
21: 1 5 1 1 2 8 NA NA
22: 1 5 1 2 1 4 NA NA
23: 1 5 1 3 2 NA 4 4
24: 1 5 1 4 2 NA 4 4
25: 1 5 1 5 2 NA NA NA
26: 1 6 1 1 1 10 NA NA
27: 1 6 1 2 1 9 NA NA
28: 1 6 1 3 1 NA 10 10
29: 1 6 1 4 1 NA 10 10
30: 1 6 1 5 1 NA NA NA
T Group Period Player Player.Picked Player1/2Value extracted.Player1/2Value extractedValue

Rearranging columns with NAs [duplicate]

This question already has answers here:
How to move cells with a value row-wise to the left in a dataframe [duplicate]
(5 answers)
Closed 4 years ago.
Sorry guys,
this is probably a silly question but I do not manage to find a quick solution to solve this issue.
I have a dataframe of this form indicating the number of components of households and gender of each member
Familyid Gender_1 Gender_2 Gender_3 Gender_4 Ncomponent
1 1 NA NA NA 1
2 NA 1 NA NA 1
3 1 2 NA NA 2
4 1 NA 2 NA 2
5 NA 1 2 NA 2
6 2 NA NA 1 2
I would like to collect this info just in two columns in the following way.
Familyid Gender_member1 Gender_member2 Ncomponent
1 1 NA 1
2 1 NA 1
3 1 2 2
4 1 2 2
5 1 2 2
6 2 1 2
In other words I want to create a column indicating gender of member 1, regardless in which column he/she is located in my original dataframe, and a different one indicating gender of the second family member, whenever this latter exists.
Can anyone helping me out with this?
Marco
I just removed NAs for Gender_x columns.
xy <- read.table(text = "Familyid Gender_1 Gender_2 Gender_3 Gender_4 Ncomponent
1 1 NA NA NA 1
2 NA 1 NA NA 1
3 1 2 NA NA 2
4 1 NA 2 NA 2
5 NA 1 2 NA 2
6 2 NA NA 1 2",
header = TRUE)
xy
fetch.gender <- grepl("^Gender_\\d{1}$", names(xy))
out <- apply(xy[, fetch.gender], MARGIN = 1, FUN = na.omit)
out <- do.call(rbind, out)
names(out) <- c("Gender_member1", "Gender_member2")
data.frame(Familyid = xy$Familyid, out, Ncomponent = xy$Ncomponent)
Familyid Gender_1 Gender_2 Ncomponent
1 1 1 1 1
2 2 1 1 1
3 3 1 2 2
4 4 1 2 2
5 5 1 2 2
6 6 2 1 2

Create a counting variable which I can use to group my unemployment data in R

I have data as below where i created the variable "B" with the function:
index <- which(Count$unemploymentduration ==1)
Count$B[index]<-1:length(index)
ID unemployment B
1 1 1
1 2 NA
1 3 NA
1 4 NA
2 1 2
2 2 NA
2 0 NA
2 1 3
2 2 NA
2 3 NA
2 4 NA
2 5 NA
And i want my data in this way and have no real idea how to get it like this.
Thought of an "if-function" but never used one in R.
ID unemployment B
1 1 1
1 2 1
1 3 1
1 4 1
2 1 2
2 2 2
2 0 2
2 1 3
2 2 3
2 3 3
2 4 3
2 5 3
Could someone help me out?
We can use na.locf from library(zoo)
library(zoo)
Count$B <- na.locf(Count$B)
But, this can be created directly without using an 'index'
Count$B <- cumsum(Count$unemployment==1)

How to create a count variable by group for specific values in the variable of interest?

At the moment I have to deal with paradata (long-format) generated by a software during the data collection phase of a cohort study.
How can I create a variable containing the number of occurence of a certain value by a group-variable (like by id: gen _n if VAR1==2 in Stata)?
Basically the data looks like this:
ID: VAR1:
1 2
1 1
1 2
2 2
2 3
2 2
3 2
3 2
3 2
I can create a variable count.1 using
`data$count.1 <- ave(data$VAR1, data$ID, FUN = seq_along)`
ID: VAR1: count.1:
1 2 1
1 1 2
1 2 3
2 2 1
2 3 2
2 2 3
3 2 1
3 2 2
3 2 3
How can I create a variable count.2 counting by ID the number of the occurence of the event 2 in VAR1?
ID: VAR1: count.1: count.2:
1 2 1 1
1 1 2 NA
1 2 3 2
2 2 1 1
2 3 2 NA
2 2 3 2
3 1 1 NA
3 2 2 1
3 2 3 2
The Data:
ID=c(1,1,1,2,2,2,3,3,3)
VAR1=c(2,1,2,2,3,2,1,2,2)
data <- as.data.frame(cbind(ID, VAR1))
Thanks in advance!!!
Try
data$count.2 <- with(data, ave(VAR1==2, ID,
FUN=function(x) ifelse(x, cumsum(x), NA)) )
data$count.2
#[1] 1 NA 2 1 NA 2 NA 1 2
Or using data.table
library(data.table)
setDT(data)[VAR1==2, count.2:=1:.N, by=ID][]
# ID VAR1 count.2
#1: 1 2 1
#2: 1 1 NA
#3: 1 2 2
#4: 2 2 1
#5: 2 3 NA
#6: 2 2 2
#7: 3 1 NA
#8: 3 2 1
#9: 3 2 2
Or using dplyr
library(dplyr)
data %>%
group_by(ID) %>%
mutate(count.2= ifelse(VAR1==2, cumsum(VAR1==2), NA))

Resources