Participants in an experiment took a test that has a rule that says "once a participant has gotten 6 items wrong in a window of 8 items, you stop running the test". However, some experimenters kept testing past this point. I now need to find a way in which I can automatically see where the test should have been stopped, and change all values following the end to 0 (= item wrong). I am not even sure if this is something that can be done in R.
To be clear, I would like to go row by row (which are the participants) and once there are six 0s in a given window of 8 columns (items), I would need all values after the sixth 0 to be 0 too.
While the reproducible data is below, here is a visualization of what I would need, where the blue cells are the ones that should change to 0:
Pre-changes
Post-changes
Reproducible data:
structure(list(Participant_ID = c("E01P01", "E01P02", "E01P03",
"E01P04", "E01P05", "E01P06", "E01P07", "E01P08", "E02P01", "E02P02"
), A2 = c(1, 1, 1, 0, 0, 1, 1, 1, 1, 1), A3 = c(1, 1, 0, 0, 0,
1, 0, 0, 0, 0), B1 = c(1, 1, 1, 0, 0, 1, 0, 0, 1, 1), B2 = c(1,
1, 1, 1, 1, 1, 0, 0, 0, 1), C3 = c(1, 0, 0, 1, 0, 1, 0, 0, 0,
1), C4 = c(1, 0, 0, 0, 0, 1, 0, 0, 1, 1), D1 = c(1, 0, 0, 0,
0, 1, 0, 0, 0, 0), D3 = c(1, 1, 1, 1, 0, 0, 1, 0, 0, 1), E1 = c(1,
0, 0, 0, 0, 1, 0, 0, 0, 1), E3 = c(1, 1, 0, 1, 0, 1, 0, 0, 0,
0), F1 = c(1, 0, 0, 0, 1, 0, 0, 1, 0, 0), F4 = c(1, 1, 1, 1,
0, 1, 0, 1, 1, 0), G1 = c(1, 0, 0, 0, 0, 1, 0, 0, 0, 1), G2 = c(0,
0, 0, 0, 1, 1, 1, 0, 1, 1)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
Any help is highly appreciated!
Here is a solution that involves some pivoting, rollsum, cumsum, if_else logic, then pivoting back. Let me know if it works.
library(tidyverse)
library(zoo)
structure(list(Participant_ID = c("E01P01", "E01P02", "E01P03",
"E01P04", "E01P05", "E01P06", "E01P07", "E01P08", "E02P01", "E02P02"
), A2 = c(1, 1, 1, 0, 0, 1, 1, 1, 1, 1), A3 = c(1, 1, 0, 0, 0,
1, 0, 0, 0, 0), B1 = c(1, 1, 1, 0, 0, 1, 0, 0, 1, 1), B2 = c(1,
1, 1, 1, 1, 1, 0, 0, 0, 1), C3 = c(1, 0, 0, 1, 0, 1, 0, 0, 0,
1), C4 = c(1, 0, 0, 0, 0, 1, 0, 0, 1, 1), D1 = c(1, 0, 0, 0,
0, 1, 0, 0, 0, 0), D3 = c(1, 1, 1, 1, 0, 0, 1, 0, 0, 1), E1 = c(1,
0, 0, 0, 0, 1, 0, 0, 0, 1), E3 = c(1, 1, 0, 1, 0, 1, 0, 0, 0,
0), F1 = c(1, 0, 0, 0, 1, 0, 0, 1, 0, 0), F4 = c(1, 1, 1, 1,
0, 1, 0, 1, 1, 0), G1 = c(1, 0, 0, 0, 0, 1, 0, 0, 0, 1), G2 = c(0,
0, 0, 0, 1, 1, 1, 0, 1, 1)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame")) %>%
as_tibble() %>%
pivot_longer(-1) %>%
group_by(Participant_ID) %>%
mutate(running_total = zoo::rollsumr(value==0, k = 8, fill = 0),
should_terminate = cumsum(running_total >= 6),
value = if_else(should_terminate > 0, 0, value)) %>%
ungroup() %>%
select(Participant_ID, name, value) %>%
pivot_wider(names_from = name, values_from = value)
Related
In my dataframe, the three responses (yes, maybe, no) to a question are printed as three separate variables (a binary outcome of each possible response).
I want to combine the three binary responses into one variable, showing which response was selected.
The following piece of code does this:
data$var1 <- ifelse(data$var1.Yes, 0,
ifelse(data$var1.Maybe, 1,
ifelse(data$var1.No,2, NA)))
However, because I have many variables (e.g., var1, var2, var3, etc..), I want to pass a function or loop where the code runs for multiple variables whose column names include ascending numbers.
I thought of the following function:
fun <- function(i){
paste0("data$var", i) <- ifelse(paste0("data$var", i, ".Yes"), 0,
ifelse(paste0("data$var",i,".Maybe"), 1,
ifelse(paste0("data$var",i,".No"),2, NA)))
}
fun(1:3)
Unfortunately, this does not work. How can I apply this function to several variables at once?
dput(test)
structure(list(var1.Yes = c(0, 0, 1, 0, 1, 1, 1, 0, NA, 1),
var1.Maybe = c(1, 0, 0, 1, 0, 0, 0, 0, NA, 0),
var1.No= c(0, 1, 0, 0, 0, 0, 0, 1, NA, 1),
var2.Yes = c(0, 0, 1, NA, 1, 1, 1, 0, 0, 1),
var2.Maybe = c(0, 1, 1, NA, 0, 0, 0, 0, 0, 0),
var2.No= c(1, 0, 0, NA, 0, 0, 0, 1, 1, 0),
var3.Yes = c(0, 1, 0, 0, 0, 0, 0, NA, 0, 1),
var3.Maybe = c(0, 0, 0, 0, 1, 1, 1, NA, 1, 0),
class = "data.frame"))
You can loop through each three columns;
lapply(1:(ncol(test)/3), function(col) ifelse(test[,col*3-2], 0,
ifelse(test[,col*3-1], 1,
ifelse(col*3, 2, NA))))
# [[1]]
# [1] 1 2 0 1 0 0 0 2 NA 0
#
# [[2]]
# [1] 2 1 0 NA 0 0 0 2 2 0
#
# [[3]]
# [1] 2 0 2 2 1 1 1 NA 1 0
This can be merged with your data:
cbind(test, matrix(unlist(lapply_results), nrow = nrow(test)))
Data:
data.frame(
var1.Yes = c(0, 0, 1, 0, 1, 1, 1, 0, NA, 1),
var1.Maybe= c(1, 0, 0, 1, 0, 0, 0, 0, NA, 0),
var1.No = c(0, 1, 0, 0, 0, 0, 0, 1, NA, 1),
var2.Yes = c(0, 0, 1, NA, 1, 1, 1, 0, 0, 1),
var2.Maybe= c(0, 1, 1, NA, 0, 0, 0, 0, 0, 0),
var2.No = c(1, 0, 0, NA, 0, 0, 0, 1, 1, 0),
var3.Yes = c(0, 1, 0, 0, 0, 0, 0, NA, 0, 1),
var3.Maybe= c(0, 0, 0, 0, 1, 1, 1, NA, 1, 0),
var3.No = c(1, 0, 1, 1, 0, 0, 0, NA, 0, 0)) -> test
I have this data.frame
df <- data.frame(
variable=c(2.4860651, -0.68863024, 2.63530974, -2.95754943, 1.67945091, 2.63530974,
4.79002539, 2.32575938, 3.57236441, -0.364825998, -2.00646016, -3.12380516,
0.69307013, -5.65846824, 0.45632519, 2.08978142),
A=c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0),
B=c(1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0),
C=c(0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1),
D=c(1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0),
E=c(0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0),
F=c(0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1))
I would like to perform wilcox.test for each column with groups defined by 0 and 1 in the columns and using the variables in the column df$variable. Then add the p.values in a new row and adjusted p.values in another row.
I have tried this:
library(dplyr)
result <- df %>% summarise(across(!variable, ~wilcox.test(.x ~ variable)$p.value), exact=NULL) %>%
bind_rows(., p.adjust(., method = 'BH')) %>%
bind_rows(df, .) %>%
mutate(variable=replace(variable, is.na(variable), c('p.values', 'p.adjust')))
But this causes errors.
This is the result I would like to get:
result <- data.frame(
variable=c(2.4860651, -0.68863024, 2.63530974, -2.95754943, 1.67945091, 2.63530974,
4.79002539, 2.32575938, 3.57236441, -0.364825998, -2.00646016, -3.12380516,
0.69307013, -5.65846824, 0.45632519, 2.08978142, 'p.value', 'p.adjust'),
A=c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1),
B=c(1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0.560444274, 1),
C=c(0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0.143117298, 0.764253489),
D=c(1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0.820753088, 1),
E=c(0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0.95482869, 1),
F=c(0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0.254751163, 0.764253489))
Can anyone help?
You may try something along the lines of the following -
library(dplyr)
tmp <- df %>% summarise(across(!variable,
~wilcox.test(variable[.x == 0], variable[.x == 1])$p.value))
adj_value <- p.adjust(unlist(tmp), method = "BH")
result <- bind_rows(df %>% mutate(variable = as.character(variable)),
rbind(tmp, adj_value) %>%
mutate(variable = c('p.values', 'p.adjust'))
)
Thank you, Ronak. I modified your former answer and I found that this also works and results in the same as you found:
result <- df %>%
summarise(across(!variable,
~wilcox.test(variable[.x == 0], variable[.x == 1])$p.value), exact=NULL) %>%
bind_rows(., p.adjust(., method = 'BH')) %>%
bind_rows(df, .) %>%
mutate(variable=replace(variable, is.na(variable), c('p.values', 'p.adjust')))
Thank you! :)
I have two DF's:
passesComb <- structure(list(P1_Good = c(0, 1, 0, 0, 0, 0, 1), P2_Good = c(2,
0, 0, 0, 0, 0, 2), P3_Good = c(0, 1, 0, 0, 0, 0, 1), P4_Good = c(0,
0, 1, 0, 0, 0, 1), P5_Good = c(0, 0, 0, 1, 0, 0, 1), P1_Bad = c(0,
0, 0, 0, 0, 0, 0), P2_Bad = c(0, 0, 0, 0, 0, 0, 0), P3_Bad = c(0,
0, 0, 0, 0, 0, 0), P4_Bad = c(0, 0, 1, 0, 0, 0, 1), P5_Bad = c(0,
0, 0, 0, 0, 0, 0), `Bad Pass` = c(0, 0, 1, 0, 0, 1, 1), `Good Pass` = c(2,
2, 1, 1, 0, 3, 6), `Intercepted Pass` = c(0, 0, 0, 0, 0, 1, 0
), Turnover = c(0, 0, 0, 0, 0, 1, 0), totalEvents = c(2, 2, 2,
1, 0, 6, 7)), row.names = c("P1", "P2", "P3", "P4", "P5", "Opponent",
"VT"), class = "data.frame")
of size 7x15, and
copyComb <- structure(list(P1_Good = c(0, 1, 0, 0, 0, 1), P2_Good = c(2,
0, 0, 0, 0, 2), P4_Good = c(0, 0, 0, 0, 0, 0), P5_Good = c(0,
0, 1, 0, 0, 1), P1_Bad = c(0, 0, 0, 0, 0, 0), P2_Bad = c(0, 0,
0, 0, 0, 0), P3_Bad = c(0, 0, 0, 0, 0, 0), P4_Bad = c(0, 0, 0,
0, 0, 0), P5_Bad = c(0, 0, 0, 0, 0, 0), `Bad Pass` = c(0, 0,
0, 0, 1, 0), `Good Pass` = c(2, 1, 1, 0, 3, 4), `Intercepted Pass` = c(0,
0, 0, 0, 1, 0), Turnover = c(0, 0, 0, 0, 1, 0), totalEvents = c(2,
1, 1, 0, 6, 4)), row.names = c("P1", "P2", "P4", "P5", "Opponent",
"VT"), class = "data.frame")
or simply,
copyComb <- passesComb
copyComb <- copyComb[-3,-3]
#Updating specific cells since [3,3] is removed
copyComb[2,11] <- 1
copyComb[2,14] <- 1
copyComb[6,8] <- 0
copyComb[6,3] <- 0
copyComb[6,10] <- 0
copyComb[6,11] <- 4
copyComb[6,14] <- 4
#This now equals the copyComb from dput() above
of size 6x14.
I am trying to combine/add these two df's together based on matching row/column names. I tried to achieve this using the code from the answer to this post
gamesComb <- data.frame(matrix(NA, nrow = ifelse(nrow(passesComb) >= nrow(copyComb), nrow(passesComb),nrow(copyComb)),
ncol = ifelse(ncol(passesComb) >= ncol(copyComb), ncol(passesComb),ncol(copyComb))))
gamesComb[row.names(ifelse(nrow(passesComb) >= nrow(copyComb), passesComb, copyComb)),
colnames(ifelse(ncol(passesComb) >= ncol(copyComb), passesComb, copyComb))] <- passesComb
Here, I create a df, gamesComb and set the dimensions of whichever passesComb or copyComb is bigger. It does create a 7x15 df, but doesn't add the row/col names.
I also am trying to then add the 2 df's together based on the cell value if they have the same row/col name (same as in the post link above), i.e. passesComb["P2","P1_Good"] = 1 and copyComb["P2","P1_Good"] = 1, so gamesComb["P2","P1_Good"] should = 2, and same for all similar row/col names.
So the final result look like:
expectedOutput <- structure(list(P1_Good = c(0, 2, 0, 0, 0, 0, 2), P2_Good = c(4,
0, 0, 0, 0, 0, 4), P3_Good = c(0, 1, 0, 0, 0, 0, 1), P4_Good = c(0,
0, 1, 0, 0, 0, 1), P5_Good = c(0, 0, 0, 2, 0, 0, 2), P1_Bad = c(0,
0, 0, 0, 0, 0, 0), P2_Bad = c(0, 0, 0, 0, 0, 0, 0), P3_Bad = c(0,
0, 0, 0, 0, 0, 0), P4_Bad = c(0, 0, 1, 0, 0, 0, 1), P5_Bad = c(0,
0, 0, 0, 0, 0, 0), `Bad Pass` = c(0, 0, 1, 0, 0, 2, 1), `Good Pass` = c(4,
3, 1, 2, 0, 6, 10), `Intercepted Pass` = c(0, 0, 0, 0, 0, 2,
0), Turnover = c(0, 0, 0, 0, 0, 2, 0), totalEvents = c(4, 3,
2, 2, 0, 12, 11)), row.names = c("P1", "P2", "P3", "P4", "P5",
"Opponent", "VT"), class = "data.frame")
Here's a dplyr/tidyr approach where I reshape each table into a long format, then join them, sum, and pivot wider again.
library(dplyr); library(tidyr)
lengthen <- function(df) { df %>% rownames_to_column(var = "row") %>% pivot_longer(-row)}
full_join(lengthen(passesComb), lengthen(copyComb), by = c("row", "name")) %>%
mutate(new_val = coalesce(value.x, 0) + coalesce(value.y, 0)) %>%
select(-starts_with("value")) %>%
pivot_wider(names_from = name,values_from = new_val)
Another option is to stack them and then sum by rowname groups.
library(dplyr, warn.conflicts = FALSE)
library(tibble)
out <-
rownames_to_column(passesComb) %>%
bind_rows(rownames_to_column(copyComb)) %>%
# bind_rows(rownames_to_column(third_table)) %>% if you want to add another
select(rowname, names(passesComb)) %>%
group_by(rowname) %>%
summarise(across(everything(), sum, na.rm = T)) %>%
slice(match(rownames(passesComb), rowname)) %>%
column_to_rownames('rowname')
all.equal(out, expectedOutput)
#> [1] TRUE
Created on 2021-10-09 by the reprex package (v2.0.1)
I have a 3185x90 dataset of binary values and want to do a chi-squared test of independence, comparing all column variables against each other.
I've been tried using different variations of code from google searches with chisq.test() and some for loops, but none of them have worked so far.
How do I do this?
This is the frame I've tinkered with. My dataset is oak.
chi_trial <- data.frame(a = c(0,1), b = c(0,1))
for(row in 1:nrow(oak)){
print(row)
print(chisq.test(c(oak[row,1],d[row,2])))
}
I also tried this:
apply(d, 1, chisq.test)
which gives me the error: Error in FUN(newX[, i], ...) :
all entries of 'x' must be nonnegative and finite
dput(oak[1:2],)
structure(list(post_flu = structure(c(1, 1, 1, 1, 1, 0, 0, 0,
0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,
0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1,
1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
label = "Receipt of Flu Vaccine - Encounter Survey", format.stata = "%10.0g")), row.names = c(NA,
-3185L), class = c("tbl_df", "tbl", "data.frame"), label = "Main Oakland Clinic Analysis Dataset")
I added a sample of my data with the final lines of the output. The portion of the dataset is small, but it all looks like this.
You could use something like the code below, which is similar to R's cor function. I don't have your data, so I'm simulating some. Note that I get one significant p-value, using the traditional cut-off of 0.05.
set.seed(3)
nr=3185; nc=3
oak <- as.data.frame(matrix(sample(0:1, size=nr*nc, replace=TRUE), ncol=nc))
oak
mult.chi <- function(data){
nc <- ncol(data)
res <- matrix(0, nrow=nc, ncol=nc) # or NA
for(i in 1:(nc-1))
for(j in (i+1):nc)
res[i,j] <- suppressWarnings(chisq.test(oak[,i], oak[,j])$p.value)
rownames(res) <- colnames(data)
colnames(res) <- colnames(data)
res
}
mult.chi(oak)
# V1 V2 V3
# V1 0 0.7847063 0.32012466
# V2 0 0.0000000 0.01410326
# V3 0 0.0000000 0.00000000
So consider applying a multiple testing adjustment as mentioned in the comments.
Here is a solution with combn to get all combinations of column numbers 2 by 2. Tested with the data in #Edward's answer.
chisq2cols <- function(X){
y <- matrix(0, ncol(X), ncol(X))
cmb <- combn(ncol(X), 2)
y[upper.tri(y)] <- apply(cmb, 2, function(k){
tbl <- table(X[k])
chisq.test(tbl)$p.value
})
y
}
chisq2cols(oak)
# [,1] [,2] [,3]
#[1,] 0 0.7847063 0.32012466
#[2,] 0 0.0000000 0.01410326
#[3,] 0 0.0000000 0.00000000
I everybody I remove my last post to make a reproducible exmaple of my problem. I am working with the next to data frames a1 (dput structure):
structure(list(r04_numero_operacion = c("0050475725", "0050490602",
"0050491033", "0050496386", "0050518985", "0050630090", "0050631615",
"0060235906", "0060238732", "0060241333", "0060244391", "0060245813",
"0060260056", "0060266356", "0800041441", "0800054041", "0800055382",
"0800058554", "2020200062", "2020200073", "CAR1010001706000",
"CAR1010001795000", "CAR1010001803000", "CAR1010001871000", "CAR1010001962000",
"CAR1010002002000", "CAR1010002120000", "CAR1010002189000", "CAR1010002215000",
"CAR1010002250000"), perdida3 = c(523.12, 265.43, 8371.66, 5242.13,
4960.51, 8473.27, 3743.45, 1283.32, 2229.25, 8001.27, 8653.94,
3670.13, 4536.02, 8216.55, 2481.36, 288.94, 1637.28, 4566.89,
1573.63, 11217.92, 0, 0, 0, 0, 0, 0, 0, 0, 9633.9, 0), Saldo = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 288.94, 1637.28, 4566.89,
1, 1, 481.59, 299.52, 258.13, 603.84, 231.61, 631.68, 220.6,
210.54, 1, 1224.44), Bvencida = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 603.84, 0, 631.68,
0, 0, 0, 0), Cvencida = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1224.44),
Dvencida = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), vencida = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 288.94, 1637.28,
4566.89, 1, 1, 0, 0, 0, 603.84, 0, 631.68, 0, 0, 1, 1224.44
), V1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("r04_numero_operacion",
"perdida3", "Saldo", "Bvencida", "Cvencida", "Dvencida", "vencida",
"V1"), codepage = 1252L, row.names = c(NA, 30L), class = "data.frame")
And a2 data frame (dput structure):
structure(list(r04_numero_operacion = c("0050475725", "0050490602",
"0050491033", "0050496386", "0050518985", "0050630090", "0050631615",
"0060235906", "0060238732", "0060241333", "0060244391", "0060245813",
"0060260056", "0060266356", "0800041441", "0800054041", "0800055382",
"0800058554", "2020200073", "CAR1010002002000", "CAR1010002189000",
"CAR1010002215000", "CAR1010002250000", "CAR1010002264000", "CAR1010002297000",
"CAR1010002401000", "CAR1010002412000", "CAR1010002436000", "CAR1010002529000",
"CAR1010002709000"), perdida3 = c(523.12, 265.43, 8371.66, 5242.13,
4960.51, 8473.27, 3743.45, 1283.32, 2229.25, 8001.27, 8653.94,
3670.13, 4536.02, 8216.55, 2481.36, 288.94, 1637.28, 4566.89,
11217.92, 0, 0, 9633.9, 0, 0, 0, 0, 0, 0, 0, 0), Saldo = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 288.94, 1637.28, 4566.89,
1, 317.72, 210.54, 1, 868.93, 242.91, 298.78, 120.63, 255.01,
357.68, 284.08, 308.83), Bvencida = c(0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 317.72, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0), Cvencida = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 868.93, 0, 0, 0, 0, 0, 0, 0), Dvencida = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0), vencida = c(1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 288.94, 1637.28, 4566.89, 1, 317.72, 0,
1, 868.93, 0, 0, 0, 0, 0, 0, 0), V2 = c(2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2)), .Names = c("r04_numero_operacion", "perdida3", "Saldo",
"Bvencida", "Cvencida", "Dvencida", "vencida", "V2"), class = "data.frame", row.names = c(NA,
30L))
My problem is when I use merge() and match() functions. merge() is more functional than match() related to add new variables by common one but when I use merge() I don't get the same result as match(). First I used merge() with a2 and a1 to create DF with the next code:
DF=merge(a2,a1,all.x=TRUE)
It added V1 variable from a1 to DF and I got this summary for DF$V1:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1 1 1 1 1 1 9
After I create a copy of a2 named DF and I made a match with r04_numero_operacion using this code to add V1 variable from a1 to a2:
a2$V1<-a1[match(a2$r04_numero_operacion,a1$r04_numero_operacion),"V1"]
It added `V1 to DF but the result is different to the merge() way. I got this summary for DF$V1 in match() solution:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1 1 1 1 1 1 7
My problem is I want to make the same I made with match() but using merge() function due to this function is more poweful than match(). Thanks for your help.
In using match(a2$r04_numero_operacion,a1$r04_numero_operacion) the a2$r04_numero_operacion values gets matched the coresponding column in a1 while in using merge(a2,a1,all.x=TRUE) the a1 all the matching columns get matched to the matching column names in a2. If you only match on the first column, the NA counts match up:
summary( merge(a2,a1,by=1,all.x=TRUE)$V1 )
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1 1 1 1 1 1 7