Rearranging columns with NAs [duplicate] - r

This question already has answers here:
How to move cells with a value row-wise to the left in a dataframe [duplicate]
(5 answers)
Closed 4 years ago.
Sorry guys,
this is probably a silly question but I do not manage to find a quick solution to solve this issue.
I have a dataframe of this form indicating the number of components of households and gender of each member
Familyid Gender_1 Gender_2 Gender_3 Gender_4 Ncomponent
1 1 NA NA NA 1
2 NA 1 NA NA 1
3 1 2 NA NA 2
4 1 NA 2 NA 2
5 NA 1 2 NA 2
6 2 NA NA 1 2
I would like to collect this info just in two columns in the following way.
Familyid Gender_member1 Gender_member2 Ncomponent
1 1 NA 1
2 1 NA 1
3 1 2 2
4 1 2 2
5 1 2 2
6 2 1 2
In other words I want to create a column indicating gender of member 1, regardless in which column he/she is located in my original dataframe, and a different one indicating gender of the second family member, whenever this latter exists.
Can anyone helping me out with this?
Marco

I just removed NAs for Gender_x columns.
xy <- read.table(text = "Familyid Gender_1 Gender_2 Gender_3 Gender_4 Ncomponent
1 1 NA NA NA 1
2 NA 1 NA NA 1
3 1 2 NA NA 2
4 1 NA 2 NA 2
5 NA 1 2 NA 2
6 2 NA NA 1 2",
header = TRUE)
xy
fetch.gender <- grepl("^Gender_\\d{1}$", names(xy))
out <- apply(xy[, fetch.gender], MARGIN = 1, FUN = na.omit)
out <- do.call(rbind, out)
names(out) <- c("Gender_member1", "Gender_member2")
data.frame(Familyid = xy$Familyid, out, Ncomponent = xy$Ncomponent)
Familyid Gender_1 Gender_2 Ncomponent
1 1 1 1 1
2 2 1 1 1
3 3 1 2 2
4 4 1 2 2
5 5 1 2 2
6 6 2 1 2

Related

How to make the next number in a column a sequence in r

sorry to bother everyone. I have been stuck with coding
Student Number
1 NA
1 NA
1 1
1 1
2 NA
2 1
2 1
2 1
3 NA
3 NA
3 1
3 1
I tried using dplyr to cluster by students try to find a way so that every time it reads that 1, it adds it to the following column so it would read as
Student Number
1 NA
1 NA
1 1
1 2
2 NA
2 1
2 2
2 3
3 NA
3 NA
3 1
3 2
etc
Thank you! It'd help with attendance.
data.table solution;
library(data.table)
setDT(df)
df[!is.na(Number),Number:=cumsum(Number),by=Student]
df
Student Number
<int> <int>
1 1 NA
2 1 NA
3 1 1
4 1 2
5 2 NA
6 2 1
7 2 2
8 2 3
9 3 NA
10 3 NA
11 3 1
12 3 2
Try using cumsum, note that cumsum itself cannot ignore NA
library(dplyr)
df %>%
group_by(Student) %>%
mutate(n = cumsum(ifelse(is.na(Number), 0, Number)) + 0 * Number)
Student Number n
<int> <int> <dbl>
1 1 NA NA
2 1 NA NA
3 1 1 1
4 1 1 2
5 2 NA NA
6 2 1 1
7 2 1 2
8 2 1 3
9 3 NA NA
10 3 NA NA
11 3 1 1
12 3 1 2

how to remove Some NA with respect of 2 groups [duplicate]

This question already has an answer here:
R remove groups with only NAs
(1 answer)
Closed 3 years ago.
suppose I have
HH PP mode
1 1 2
1 1 NA
1 1 NA
1 2 2
1 2 2
1 3 NA
1 3 NA
2 1 2
2 1 NA
2 2 NA
2 2 NA
first column is household index and second is persons in each household. I want to remove rows whose are NA in mode for each person in each household.for example in the first household mode column for third person is all NA so I want to remove it. same for second person in second family
output:
HH PP mode
1 1 2
1 1 NA
1 1 NA
1 2 2
1 2 2
2 1 2
2 1 NA
library(data.table)
dt[, .SD[ ( !all( is.na( mode ) ) ) ], by= .( HH, PP ) ][]
HH PP mode
1: 1 1 2
2: 1 1 NA
3: 1 1 NA
4: 1 2 2
5: 1 2 2
6: 2 1 2
7: 2 1 NA
sample data
dt <- fread(" HH PP mode
1 1 2
1 1 NA
1 1 NA
1 2 2
1 2 2
1 3 NA
1 3 NA
2 1 2
2 1 NA
2 2 NA
2 2 NA")

Grouping in Embedded Group Structures in R data.table

I have a data.table object looks like this:
FamilyID InterFamilyID MumInFamilyID Edu
1 1 NA 2
1 2 NA 5
1 3 2 3
2 1 NA 6
2 2 1 9
2 2 1 3
I want to perform a query like this one:
tbl1[, MumEdu:= Edu[InterFamilyID == MumInFamilyID], by=FamilyID]
to get something like this:
FamilyID InterFamilyID MumInFamilyID Edu MumEdu
1 1 NA 2 NA
1 2 NA 5 NA
1 3 2 3 5
2 1 NA 6 NA
2 2 1 9 6
2 2 1 3 6
In fact I have a data.table grouped by a column (FamilyID) and each of these groups are 1-1 grouped by another column (InterFamilyID). In another column there is reference to smaller group id of another group member. I want to use these values to access the referenced rows values.
You can use match to:
returns a vector of the positions of (first) matches of its first argument in its second.
and use the result positions to find out the corresponding element in Edu column:
tbl1[, MumEdu := Edu[match(MumInFamilyID, InterFamilyID)], by = FamilyID]
tbl1
# FamilyID InterFamilyID MumInFamilyID Edu MumEdu
#1: 1 1 NA 2 NA
#2: 1 2 NA 5 NA
#3: 1 3 2 3 5
#4: 2 1 NA 6 NA
#5: 2 2 1 9 6
#6: 2 2 1 3 6

Create a counting variable which I can use to group my unemployment data in R

I have data as below where i created the variable "B" with the function:
index <- which(Count$unemploymentduration ==1)
Count$B[index]<-1:length(index)
ID unemployment B
1 1 1
1 2 NA
1 3 NA
1 4 NA
2 1 2
2 2 NA
2 0 NA
2 1 3
2 2 NA
2 3 NA
2 4 NA
2 5 NA
And i want my data in this way and have no real idea how to get it like this.
Thought of an "if-function" but never used one in R.
ID unemployment B
1 1 1
1 2 1
1 3 1
1 4 1
2 1 2
2 2 2
2 0 2
2 1 3
2 2 3
2 3 3
2 4 3
2 5 3
Could someone help me out?
We can use na.locf from library(zoo)
library(zoo)
Count$B <- na.locf(Count$B)
But, this can be created directly without using an 'index'
Count$B <- cumsum(Count$unemployment==1)

creating new variable with ifelse with NA's

I have a data frame that consists of the answers to the question:
"What language do you speak at home?
1=English
2=Spanish
and so on...
first.language second.language third.language fourth.language
1 1 NA NA NA
2 1 2 NA NA
3 1 2 NA NA
4 2 NA NA NA
5 1 2 NA NA
6 1 5 NA NA
What I want to do is create the variable: "english.home"
"english.home" will be:
1=if English is spoken at home, never mind if it is the first, second... language
2(else)=if English is not spoken at home.
I tried using:
student1$english.home = ifelse(student1$first.language==1 |
student1$second.language==1 | student1$third.language==1 |
student1$fourth.language==1,1,2)
But i got:
> english.home
1 1
2 1
3 1
4 NA
5 1
6 1
Is there any way of accomplishing this without getting an NA on row number four. Because it really doesn’t matter that it is a NA what matters is that it is not English!
I know that the ifelse-Na topic was much debated. I have searched a lot for a solution before posting but could not find it.
Hope someone will help me out of this mess
Something like this should do what you want.
# Read your data
tab <- read.table(text ="
first.language second.language third.language fourth.language
1 1 NA NA NA
2 1 2 NA NA
3 1 2 NA NA
4 2 NA NA NA
5 1 2 NA NA
6 1 5 NA NA")
tab$english.home <-
apply(tab, 1, function (x) 2 - any(x == 1, na.rm = TRUE))
print(tab)
# first.language second.language third.language fourth.language english.home
#1 1 NA NA NA 1
#2 1 2 NA NA 1
#3 1 2 NA NA 1
#4 2 NA NA NA 2
#5 1 2 NA NA 1
#6 1 5 NA NA 1
We use the fact that logical vectors get promoted to numeric 0and 1 when added (or subtracted) with a numeric.
May be this helps
(!rowSums(student1==1 & !is.na(student1))) +1
#1 2 3 4 5 6
#1 1 1 2 1 1

Resources