R: Merge dataframes not aligning properly - r

I am trying to merge two data frames:
df1
Day Hour S_Value
1 1 1
1 1 2
1 1 3
1 1 1
1 2 5
1 2 6
1 2 4
1 2 2
df2
Day Hour n_Value
1 1 3
1 1 7
1 1 na
1 1 na
1 2 1
1 2 9
1 2 na
1 2 na
I used:
join <- merge(df1, df2, by = "Hour")
And got this data frame where the first number per Hour of the S_Value is filled in for all the same Hour:
join
Day Hour S_Value N_Value
1 1 1 3
1 1 1 7
1 1 1 na
1 1 1 na
1 2 5 1
1 2 5 9
1 2 5 na
1 2 5 na
I want it to look like this:
join
Day Hour S_Value N_Value
1 1 1 3
1 1 2 7
1 1 3 na
1 1 1 na
1 2 5 1
1 2 6 9
1 2 4 na
1 2 2 na

Related

How to create two columns based on some criteria in R

The data I have is almost similar to the data below.
A=01-03
B=04-06
C=07-09
D=10-11
data<-read.table (text=" ID Class Time1 Time2 Time3
1 1 1 3 3
2 1 4 3 2
3 1 2 2 2
1 2 1 4 1
2 3 2 1 1
3 2 3 2 3
1 3 1 1 2
2 2 4 3 1
3 3 3 2 1
1 1 4 3 2
2 1 2 2 2
3 2 1 4 1
", header=TRUE)
I want to create 2 columns right after the Class column, i.e. the Bin and Zero columns based on A, B, C and D and IDs.
Therefore A goes to IDs 1,2, and 3. B goes to the next IDs, i.e., 1,2 and 3, and C goes to the next IDs, i.e., 1,2,3 and so on. Column Zero gets only numbers zeros. So the outcome would be:
ID Class Bin Zero Time1 Time2 Time3
1 1 01-03 0 1 3 3
2 1 01-03 0 4 3 2
3 1 01-03 0 2 2 2
1 2 04-06 0 1 4 1
2 3 04-06 0 2 1 1
3 2 04-06 0 3 2 3
1 3 07-09 0 1 1 2
2 2 07-09 0 4 3 1
3 3 07-09 0 3 2 1
1 1 10-11 0 4 3 2
2 1 10-11 0 2 2 2
3 2 10-11 0 1 4 1
Please try the below code
library(tidyverse)
#use character vector with quotes
A='01-03'
B='04-06'
C='07-09'
D='10-11'
data<-read.table (text=" ID Class Time1 Time2 Time3
1 1 1 3 3
2 1 4 3 2
3 1 2 2 2
1 2 1 4 1
2 3 2 1 1
3 2 3 2 3
1 3 1 1 2
2 2 4 3 1
3 3 3 2 1
1 1 4 3 2
2 1 2 2 2
3 2 1 4 1
", header=TRUE)
#create a separate dataframe with bin column
data2 <- data.frame(bin=c(rep(A,3),rep(B,3),rep(C,3),rep(D,3)))
data3 <- bind_cols(data, data2) %>% mutate(zero=0)
If you are open to a dplyr based solution, you could use
library(dplyr)
data %>%
group_by(ID) %>%
mutate(Bin = c(A, B, C, D),
Zero = 0,
.after = 2) %>%
ungroup()
This returns
# A tibble: 12 × 7
ID Class Bin Zero Time1 Time2 Time3
<int> <int> <chr> <dbl> <int> <int> <int>
1 1 1 01-03 0 1 3 3
2 2 1 01-03 0 4 3 2
3 3 1 01-03 0 2 2 2
4 1 2 04-06 0 1 4 1
5 2 3 04-06 0 2 1 1
6 3 2 04-06 0 3 2 3
7 1 3 07-09 0 1 1 2
8 2 2 07-09 0 4 3 1
9 3 3 07-09 0 3 2 1
10 1 1 10-11 0 4 3 2
11 2 1 10-11 0 2 2 2
12 3 2 10-11 0 1 4 1

how to fill NA with special way

I have 3 columns 2 of them are groups one of them is NA or a or b or c.
For each element of group1 in group2 I want fill NA with number as follow:
the first row of each element of group1 in group2 is a, it should start from 1 until to get another letter a or b or c. after that I want to add 1 in a row after a, b or c .
example:
group1 group2 act
1 1 a
1 1 NA
1 1 NA
1 1 b
1 1 Na
1 1 a
1 1 NA
1 1 a
1 2 a
1 2 NA
1 2 a
2 1 a
2 1 NA
2 1 b
2 1 b
2 1 NA
2 1 a
group1 group2 act New
1 1 a 1
1 1 NA 1
1 1 NA 1
1 1 b 1
1 1 Na 2
1 1 a 2
1 1 NA 3
1 1 a 3
1 2 a 1
1 2 NA 1
1 2 a 1
2 1 a 1
2 1 NA 1
2 1 b 1
2 1 b 2
2 1 NA 3
2 1 a 3
if the pattern is not clear ask me to explain more
On the grouped data you can increment by 1 on each non-NA value by using cumsum(). The data is shifted down 1 row using lag().:
library(dplyr)
df %>%
group_by(group1, group2) %>%
mutate(new = lag(cumsum(!is.na(act)), default = 1))
# A tibble: 17 x 4
# Groups: group1, group2 [3]
group1 group2 act new
<int> <int> <chr> <dbl>
1 1 1 a 1
2 1 1 NA 1
3 1 1 NA 1
4 1 1 b 1
5 1 1 NA 2
6 1 1 a 2
7 1 1 NA 3
8 1 1 a 3
9 1 2 a 1
10 1 2 NA 1
11 1 2 a 1
12 2 1 a 1
13 2 1 NA 1
14 2 1 b 1
15 2 1 b 2
16 2 1 NA 3
17 2 1 a 3

Using R code to reorganize data frame by randomly selecting one row from each combination

I have a data frame that looks like this:
Subject N S
Sub1-1 3 1
Sub1-2 3 1
Sub1-3 3 1
Sub1-4 3 1
Sub2-1 3 1
Sub2-2 3 1
Sub2-3 3 1
Sub2-4 3 1
Sub3-1 3 2
Sub3-2 3 2
Sub3-3 3 2
Sub4-1 3 2
Sub4-2 3 2
Sub4-3 3 2
Sub5-1 3 2
Sub5-2 3 2
Sub6-1 1 1
Sub6-2 1 1
Sub6-3 1 1
Sub7-1 1 1
Sub7-2 1 1
Sub7-3 1 1
Sub8-1 1 1
Sub8-2 1 1
Sub8-3 1 2
Sub9-1 1 2
Sub9-2 1 2
Sub1-1 1 2
Sub1-2 1 2
Sub1-3 1 2
Sub5-1 1 2
Sub5-2 1 2
Sub1-5 2 1
Sub1-6 2 1
Sub1-7 2 1
Sub1-5 2 1
Sub2-6 2 1
Sub2-5 2 1
Sub2-6 2 1
Sub2-7 2 1
Sub3-8 2 2
Sub3-5 2 2
Sub3-6 2 2
Sub4-7 2 2
Sub4-5 2 2
Sub4-6 2 2
Sub5-7 2 2
Sub5-8 2 2
As you can see in this data frame there are 6 different combinations in the N and S columns, and 8 consecutive rows of each combination. I want to create a new data frame where one row from each combination (be it 3 & 1 or 1 & 2) is randomly selected and then put into a new data frame so there are 8 consecutive rows of each different combination. That way the entire data frame of all 48 rows is completely reorganized. Is this possible in R code?
Edit: The desired output would be something like this, but repeating until all 48 rows are full and the subject number for each row would have be random because it is a randomly selected row of each N & S combo.
Subject N S
3 1
1 1
3 2
1 2
2 2
2 1
2 2
3 2
2 1
1 1
3 1
1 2
A solution using functions from dplyr.
# Load package
library(dplyr)
# Set seed for reproducibility
set.seed(123)
# Process the data
dt2 <- dt %>%
group_by(N, S) %>%
sample_n(size = 1)
# View the result
dt2
## A tibble: 6 x 3
## Groups: N, S [6]
# Subject N S
# <chr> <int> <int>
#1 Sub6-3 1 1
#2 Sub5-1 1 2
#3 Sub1-5 2 1
#4 Sub5-8 2 2
#5 Sub2-4 3 1
#6 Sub3-1 3 2
Update: Reorganize the row
The following randomize all rows.
dt3 <- dt %>% slice(sample(1:n(), n()))
Data Preparation
dt <- read.table(text = "Subject N S
Sub1-1 3 1
Sub1-2 3 1
Sub1-3 3 1
Sub1-4 3 1
Sub2-1 3 1
Sub2-2 3 1
Sub2-3 3 1
Sub2-4 3 1
Sub3-1 3 2
Sub3-2 3 2
Sub3-3 3 2
Sub4-1 3 2
Sub4-2 3 2
Sub4-3 3 2
Sub5-1 3 2
Sub5-2 3 2
Sub6-1 1 1
Sub6-2 1 1
Sub6-3 1 1
Sub7-1 1 1
Sub7-2 1 1
Sub7-3 1 1
Sub8-1 1 1
Sub8-2 1 1
Sub8-3 1 2
Sub9-1 1 2
Sub9-2 1 2
Sub1-1 1 2
Sub1-2 1 2
Sub1-3 1 2
Sub5-1 1 2
Sub5-2 1 2
Sub1-5 2 1
Sub1-6 2 1
Sub1-7 2 1
Sub1-5 2 1
Sub2-6 2 1
Sub2-5 2 1
Sub2-6 2 1
Sub2-7 2 1
Sub3-8 2 2
Sub3-5 2 2
Sub3-6 2 2
Sub4-7 2 2
Sub4-5 2 2
Sub4-6 2 2
Sub5-7 2 2
Sub5-8 2 2",
header = TRUE, stringsAsFactors = FALSE)

Merge data based on response pattern in R

I have a dataframe that has survey response items (scale 1-4). This is what the data looks like for the first 10 respondents:
Q20_1n Q20_3n Q20_5n Q20_7n Q20_9n Q20_11n Q20_13n Q20_15n Q20_17n
1 1 2 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1 1
3 2 1 1 1 1 1 1 2 2
4 4 4 2 2 3 3 4 4 3
5 1 1 1 1 1 1 1 2 1
6 4 4 4 3 4 4 2 4 4
7 3 3 4 3 3 3 4 4 3
8 3 3 2 2 4 2 3 3 2
9 1 1 1 1 1 1 1 1 1
10 1 1 1 1 1 1 1 1 1
I fit an graded response model to the data, and now have theta hats for each response pattern. There are 901 observations in the raw data, but only 547 observations of theta.hat. The reason is because there is a single theta.hat for each observed response pattern - e.g., a score of '1' across all items appears 94 times. The theta.hat dataframe looks like this:
Q20_1n Q20_3n Q20_5n Q20_7n Q20_9n Q20_11n Q20_13n Q20_15n Q20_17n Obs Theta
1 1 1 1 1 1 1 1 1 1 94 -1.307
2 1 1 1 1 1 1 1 1 2 10 -.816
3 1 1 1 1 1 1 1 1 4 1 -0.750
4 1 1 1 1 1 1 1 2 1 22 -.803
5 1 1 1 1 1 1 1 2 2 6 -.524
What I am trying to do is merge the theta.hats with the original data. This seems to require matching the response patterns across two datasets. So, for example, line 10 in the raw data (with all '1's) would receive a theta hat of -1.307 because it matched the response pattern in line 1 of the theta matrix. Both datasets are structured so each variable is a numeric column.
I'm not sure how to send a reproducible dataset for this case, but am happy to if you have suggestions.
Thank you,
Andrea
How about a simple merge? Assuming your first dataset (responses) is assigned to df.1 and the second dataset (modeled with theta) is assigned to df.2:
merge(df.1, df.2, by = names(df.1), all.x = TRUE)
# Q20_1n Q20_3n Q20_5n Q20_7n Q20_9n Q20_11n Q20_13n Q20_15n Q20_17n Obs Theta
# 1 1 1 1 1 1 1 1 1 1 94 -1.307
# 2 1 1 1 1 1 1 1 1 1 94 -1.307
# 3 1 1 1 1 1 1 1 1 1 94 -1.307
# 4 1 1 1 1 1 1 1 2 1 22 -0.803
# 5 1 2 1 1 1 1 1 1 1 NA NA
# 6 2 1 1 1 1 1 1 2 2 NA NA
# 7 3 3 2 2 4 2 3 3 2 NA NA
# 8 3 3 4 3 3 3 4 4 3 NA NA
# 9 4 4 2 2 3 3 4 4 3 NA NA
# 10 4 4 4 3 4 4 2 4 4 NA NA

Combining an individual and aggregate level data sets

I've got two different data frames, lets call them "Months" and "People".
Months looks like this:
Month Site X
1 1 4
2 1 3
3 1 5
1 2 10
2 2 7
3 2 5
and People looks like this:
ID Month Site
1 1 1
2 1 2
3 1 1
4 2 2
5 2 2
6 2 2
7 3 1
8 3 2
I'd like to combine them so essentially each time an entry in "People" has a particular Month and Site combination, it's added to the appropriate aggregated data frame, so I'd get something like the following:
Month Site X People
1 1 4 2
2 1 3 0
3 1 5 1
1 2 10 1
2 2 7 3
3 2 5 1
But I haven't the foggiest idea of how to go about doing that. Any suggestions?
Using base packages
> aggregate( ID ~ Month + Site, data=People, FUN = length )
Month Site ID
1 1 1 2
2 3 1 1
3 1 2 1
4 2 2 3
5 3 2 1
> res <- merge(Months, aggdata, all.x = TRUE)
> res
Month Site X ID
1 1 1 4 2
2 1 2 10 1
3 2 1 3 NA
4 2 2 7 3
5 3 1 5 1
6 3 2 5 1
> res[is.na(res)] <- 0
> res
Month Site X ID
1 1 1 4 2
2 1 2 10 1
3 2 1 3 0
4 2 2 7 3
5 3 1 5 1
6 3 2 5 1
Assuming your data.frames are months and people, here's a data.table solution:
require(data.table)
m.dt <- data.table(months, key=c("Month", "Site"))
p.dt <- data.table(people, key=c("Month", "Site"))
# one-liner
dt.f <- p.dt[m.dt, list(X=X[1], People=sum(!is.na(ID)))]
> dt.f
# Month Site X People
# 1: 1 1 4 2
# 2: 1 2 10 1
# 3: 2 1 3 0
# 4: 2 2 7 3
# 5: 3 1 5 1
# 6: 3 2 5 1

Resources