Apply conditional selection to sequence of columns R - r

I use data from the NHANES periodontal dataset (https://wwwn.cdc.gov/Nchs/Nhanes/2009-2010/OHXPER_F.htm) and after cleaning it to only keep the "pc" variables, I have a df=setPD 168 columns that include 6 measurements (pcd, pcm, pcs, pcp, pcl, pca) around 28 teeth numbered from #02 to #31
#names(setPD)
[1] "ohx02pcd" "ohx02pcm" "ohx02pcs" "ohx02pcp" "ohx02pcl" "ohx02pca" "ohx03pcd" "ohx03pcm" "ohx03pcs" "ohx03pcp" "ohx03pcl" "ohx03pca"
[13] "ohx04pcd" "ohx04pcm" "ohx04pcs" "ohx04pcp" "ohx04pcl" "ohx04pca" "ohx05pcd" "ohx05pcm" "ohx05pcs" "ohx05pcp" "ohx05pcl" "ohx05pca"
[25] "ohx06pcd" "ohx06pcm" "ohx06pcs" "ohx06pcp" "ohx06pcl" "ohx06pca" "ohx07pcd" "ohx07pcm" "ohx07pcs" "ohx07pcp" "ohx07pcl" "ohx07pca"
[37] "ohx08pcd" "ohx08pcm" "ohx08pcs" "ohx08pcp" "ohx08pcl" "ohx08pca" "ohx09pcd" "ohx09pcm" "ohx09pcs" "ohx09pcp" "ohx09pcl" "ohx09pca"
[49] "ohx10pcd" "ohx10pcm" "ohx10pcs" "ohx10pcp" "ohx10pcl" "ohx10pca" "ohx11pcd" "ohx11pcm" "ohx11pcs" "ohx11pcp" "ohx11pcl" "ohx11pca"
[61] "ohx12pcd" "ohx12pcm" "ohx12pcs" "ohx12pcp" "ohx12pcl" "ohx12pca" "ohx13pcd" "ohx13pcm" "ohx13pcs" "ohx13pcp" "ohx13pcl" "ohx13pca"
[73] "ohx14pcd" "ohx14pcm" "ohx14pcs" "ohx14pcp" "ohx14pcl" "ohx14pca" "ohx15pcd" "ohx15pcm" "ohx15pcs" "ohx15pcp" "ohx15pcl" "ohx15pca"
[85] "ohx18pcd" "ohx18pcm" "ohx18pcs" "ohx18pcp" "ohx18pcl" "ohx18pca" "ohx19pcd" "ohx19pcm" "ohx19pcs" "ohx19pcp" "ohx19pcl" "ohx19pca"
[97] "ohx20pcd" "ohx20pcm" "ohx20pcs" "ohx20pcp" "ohx20pcl" "ohx20pca" "ohx21pcd" "ohx21pcm" "ohx21pcs" "ohx21pcp" "ohx21pcl" "ohx21pca"
[109] "ohx22pcd" "ohx22pcm" "ohx22pcs" "ohx22pcp" "ohx22pcl" "ohx22pca" "ohx23pcd" "ohx23pcm" "ohx23pcs" "ohx23pcp" "ohx23pcl" "ohx23pca"
[121] "ohx24pcd" "ohx24pcm" "ohx24pcs" "ohx24pcp" "ohx24pcl" "ohx24pca" "ohx25pcd" "ohx25pcm" "ohx25pcs" "ohx25pcp" "ohx25pcl" "ohx25pca"
[133] "ohx26pcd" "ohx26pcm" "ohx26pcs" "ohx26pcp" "ohx26pcl" "ohx26pca" "ohx27pcd" "ohx27pcm" "ohx27pcs" "ohx27pcp" "ohx27pcl" "ohx27pca"
[145] "ohx28pcd" "ohx28pcm" "ohx28pcs" "ohx28pcp" "ohx28pcl" "ohx28pca" "ohx29pcd" "ohx29pcm" "ohx29pcs" "ohx29pcp" "ohx29pcl" "ohx29pca"
[157] "ohx30pcd" "ohx30pcm" "ohx30pcs" "ohx30pcp" "ohx30pcl" "ohx30pca" "ohx31pcd" "ohx31pcm" "ohx31pcs" "ohx31pcp" "ohx31pcl" "ohx31pca"
I am trying to apply a conditional selection in each group of six columns. This is:
transmute(setPD,PD02 = ifelse(setPD$ohx02pcd >5 |
setPD$ohx02pcm>5 |setPD$ohx02pcs >5|
setPD$ohx02pcp >5 | setPD$ohx02pcl >5 |
setPD$ohx02pca >5, 1, 0))
Then for the next tooth (03) I have to write again:
transmute(setPD,PD03 = ifelse(setPD$ohx03pcd >5 |
setPD$ohx03pcm>5|setPD$ohx03pcs >5|
setPD$ohx03pcp >5|setPD$ohx03pcl >5|
setPD$ohx03pca >5, 1, 0))
I tried to firstly do that conditional selection in a more efficient way, something like:
transmute(setPD,PD02 = ifelse(list(setPD$ohx02pcd:setPD$ohx02pcp) >5, 1, 0))
but it does not work.
Then I am looking for a way to write a loop that does that over each tooth without needing to write this 28 times!!
I thought of applying the select function of dplyr in a for loop but I don't know how to do that.
At the end I want to get all the new columns I made with transmute and say that if at least 2 of the 28 columns are 1, then I have disease, if <2 are 1 then I have health. ANy help would be appreciated.
**Note: If you want to get the dataset, it is open access from CDC.org:
https://wwwn.cdc.gov/Nchs/Nhanes/2009-2010/OHXPER_F.htm **

First, it is useful to point out that the logical statements of the form is A true OR is B true OR is C true are equivalent to asking is ANY of A,B,C true? We can use this to simplify the statements setPD$ohx02pcd >5 | setPD$ohx02pcm>5 |setPD$ohx02pcs >5| ... to ask if for any of these columns it is true that their value is larger than 5.
For example, let us focus on tooth number 02 first. To get all columns that concern this tooth, we can use grep to get a vector of column names. This can be achieved with
current_tooth <- grep("02", names(setPD), value = T)
Note that if there are any other columns in the data that contain the string 02, these columns will also show up. This does not appear to be the case in your data, but it is worthwhile pointing out here in case someone else uses it and this applies in other datasets.
Now, we can use these names to subset the dataframe. For instance,
setPD[,current_tooth]
will give you the corresponding columns. In each row, we want to check if any of the above mentioned conditions are true. Given a vector of logical statements, we can check if any of them is true with the function any. To go through a dataframe by row and apply a function, we can use apply, such as in
setPD$PD02 <-
apply(setPD[,grep("02", names(setPD), value = T)], 1, function(x) any(x>5))
Now, the above applies to one tooth only, namely 02. One way of doing it for all teeth is to create a vector with all tooth indicators and use this to loop over the above lines, replacing the "02" in the above grep call in each iteration and using assign or something similar to get the variable name right. A more elegant and more efficient way is to use the same principle on long data. Consider the following:
library(reshape2)
library(dplyr)
m <- melt(setPD, id.vars="SEQN")
m$num <- substr(m$variable, 4,5) # be careful here and check output!
m <- m %>% group_by(num) %>% mutate(PS = any(value>5))
m$num <- paste0("PS", m$num)
md <- dcast(m, SEQN ~ num, value.var = "PS")
setPD <- merge(setPD, md, by="SEQN")
This melts your data first and creates a variable num that indicates your tooth. Again, make sure that this works. I have used the fact that in your data, the tooth number all appear in the 4th and 5th place in the character string. Make sure this is true, and adjust the code otherwise. Then I create a variable PS which indicates whether any of the columns that contain the tooth identifer has a value larger than 5. Last but not least I recast the data so that you have the values of PD02, PD03, etc in columns again, before I merge this to the old dataset. The line with paste0 merely creates the variable names that you want to have.

Related

How to deselect many variables without removing specific variables in dplyr

Say there is a data frame that has a structure like this:
df <- data.frame(x.1 = rnorm(n=100),
x.2 = rnorm(n=100),
x.3 = rnorm(n=100),
x.special = rnorm(n=100),
x.y.z = rnorm(n=100))
Inspecting the head, we get this output:
x.1 x.2 x.3 x.special x.y.z
1 1.01014580 -1.4047666 1.50374721 -0.8339784 -0.0831983
2 0.44307253 -0.4695634 -0.71951820 1.5758893 1.2163749
3 -0.87051845 0.1793721 -0.26838489 -1.0477929 -1.0813926
4 -0.28491936 0.4186763 -0.07494088 -0.2177471 0.3490200
5 -0.03769566 -0.3656822 0.12478667 -0.7975811 -0.4481193
6 -0.83808036 0.6842561 0.71231627 -0.3348798 1.7418141
Suppose I want to remove all the numbered variables but keep the x.special and x.y.z variables. I know that I can easily deselect with:
df %>%
select(-x.1,
-x.2,
-x.3)
However for something like 50 or 100 variables like this, it would become cumbersome. Similarly, I know I can pick patterns like so:
df %>%
select(-contains("x."))
But this of course removes everything because the special variables have the . name. Is there a more intelligent way of picking these variables? I feel like there is an option for finding the numeric variable in the name.
# use regex to remove these colums...
colsBool <- !grepl(x=names(df), pattern="\\d")
Result:
> head(df[, colsBool])
x.special x.y.z
1 1.1145156 -0.4911891
2 0.7059937 0.4500111
3 -0.6566422 1.6085353
4 -0.6322514 -0.8017260
5 0.4785106 0.6014765
6 -0.8508830 -0.5078307
Regular expressions are your best friend in this situation.
For instance, if you wanted to remove columns whose last value is a number, just do !grepl(pattern = "\\d$",...), the $ sign at the end of the expression will match only columns ending with a number. The ! sign in front of the grepl() expression negates the values in the match, that is, a TRUE becomes FALSE and vice-versa.

Find differences betwen 2 dataframes with different lengths

I have two dataframes with each two columns c("price", "size") with different lengths.
Each price must be linked to its size. It's two lists of trade orders. I have to discover the differences between the two dataframes knowing that the two databases can have orders that the other doesn't have and vice versa. I would like an output with the differences or two outputs, it doesn't matter. But I need the row number in the output to find where are the differences in the series.
Here is sample data :
> out
price size
1: 36024.86 0.01431022
2: 36272.00 0.00138692
3: 36272.00 0.00277305
4: 36292.57 0.05420000
5: 36292.07 0.00403948
---
923598: 35053.89 0.30904890
923599: 35072.76 0.00232000
923600: 35065.60 0.00273000
923601: 35049.36 0.01760000
923602: 35037.23 0.00100000
>bit
price size
1: 37279.89 0.01340020
2: 37250.84 0.00930000
3: 37250.32 0.44284049
4: 37240.00 0.00056491
5: 37215.03 0.99891906
---
923806: 35053.89 0.30904890
923807: 35072.76 0.00232000
923808: 35065.60 0.00273000
923809: 35049.36 0.01760000
923810: 35037.23 0.00100000
For example, I need to know if the first row of the database out is in the database bit.
I've tried many functions : comparedf()
summary(comparedf(bit, out, by = c("price","size"))
but I've got error:
Error in vecseq(f__, len__, if (allow.cartesian || notjoin ||
!anyDuplicated(f__, :
I've tried compare_df() :
compareout=compare_df(out,bit,c("price","size"))
But I know the results are wrong, I've only 23 results and I know that there are more than 200 differences minimum.
I've tried match(), which() functions but it doesn't get the results I search.
If you have any other methods, I will take them.
Perhaps you could just do inner_join on out and bit by price and size? But first make id variable for both data.frame's
library(dplyr)
out$id <- 1:nrow(out)
bit$id <- 1:nrow(bit)
joined <- inner_join(bit, out, by = c("price", "size"))
Now we can check which id from out and bit are not present in joined table:
id_from_bit_not_included_in_out <- bit$id[!bit$id %in% joined$id.x]
id_from_out_not_included_in_bit <- out$id[!out$id %in% joined$id.y]
And these ids are the rows not included in out or bit, i.e. variable id_from_bit_not_included_in_out contains rows present in bit, but not in out and variable id_from_out_not_included_in_bit contains rows present in out, but not in bit
First attempt here. It will be difficult to do a very clean job with this data tho.
The data I used:
out <- read.table(text = "price size
36024.86 0.01431022
36272.00 0.00138692
36272.00 0.00277305
36292.57 0.05420000
36292.07 0.00403948
35053.89 0.30904890
35072.76 0.00232000
35065.60 0.00273000
35049.36 0.01760000
35037.23 0.00100000", header = T)
bit <- read.table(text = "price size
37279.89 0.01340020
37250.84 0.00930000
37250.32 0.44284049
37240.00 0.00056491
37215.03 0.99891906
37240.00 0.00056491
37215.03 0.99891906
35053.89 0.30904890
35072.76 0.00232000
35065.60 0.00273000
35049.36 0.01760000
35037.23 0.00100000", header = T)
Assuming purely that row 1 of out should match with row 1 of bit a simple solution could be:
df <- cbind(distinct(out), distinct(bit))
names(df) <- make.unique(names(df))
However judging from the data you have provided I am not sure if this is the way to go (big differences in the first few rows) so maybe try sorting the data first?:
df <- cbind(distinct(out[order(out$price, out$size),]), distinct(bit[order(bit$price, bit$size),]))
names(df) <- make.unique(names(df))

How can I remove all rows in which one of the values satisfies a condition? apply() not working

I have a dataframe with two columns, and I want to remove all rows in which one of the values in each row is either smaller than 0 or bigger than a specified number (for the sake of argument let's call this 2000).
This is the dataframe
structure(list(xx = c(134.697838289433, 222.004361198059, 131.230956160172,
206.658871436917, 111.25078650042, 241.965831417648, 171.46912254679,
116.860666678254, 196.894985820028, 135.309699618638, 133.082437475133,
185.509376072318, 718.998297748551, 745.902984215293, 752.655615982603,
633.199684348903, 764.983924278636, 694.856525559398, 773.56532078895,
757.32358575657, 709.924023536199, 658.863564702233, 733.076690816291,
745.9306541374, 788.134444412421, 759.445624288787, 796.989170170713,
632.952543475636, 746.103571612919, 715.296116988119, 766.899107551248,
628.268453830605, 658.574104878488, 689.916530654021, 820.841422812349,
709.097957368612, 793.109262845978, 716.713801941779, 726.83260343463,
746.547080776193, 759.644057119419, 757.41275593749, 723.539527360327,
839.816318612061, 795.655016954661, 766.245386324182, 756.300015395758,
808.255074043333, 745.915083305187, 685.465492956583, 694.567959198318,
786.919467838804, 699.521900871042, 749.041223560884, 700.079697765533,
753.805501259023, 745.080253997501, 846.982894686656, 775.66384433188,
809.39649823454, 841.009469183585, 790.987061753069, 792.441925234251,
1377.97739642236, 1353.19738061511, 1259.94435540633, 1276.25060187203,
1331.26106031956, 1227.68481147557, 1345.95561236514, 1309.51489973952,
1285.62680259649, 1329.46388049714, 1256.00394500077, 1294.0505313591,
1349.09440181876, 1294.72661682462, 1339.38577920408, 1277.114896541,
1267.54884404031, 1291.32793111573, 1254.85565551553, 1298.78499697743,
1283.89664572036, 1273.92831816666, 1310.221891323, 1327.89682404014,
1310.81394400863, 595.342571560588, 689.892254230306, 562.390766853428,
736.319251501976, 609.577261412134, 641.591997384705, 682.957658696869,
580.320759093636, 560.64984978551, 643.487033739876, 688.457314818318,
631.156743281308, 659.535909106305), yy = c(1169.70954243065,
1259.830208937, 1172.21661417439, 1097.62724268622, 1198.15024522658,
1231.90665701131, 1211.36196331211, 1152.4207367321, 1287.57553021171,
1120.61366993258, 1234.70366243878, 1258.47454705197, 893.983957068268,
994.99854601335, 916.330965835536, 947.536265806389, 950.345051732045,
934.313361799171, 1018.76942964176, 918.182358835366, 1005.51128858608,
967.577307930044, 997.239384198691, 995.866808447868, 962.292293255127,
864.624084608006, 895.091604672023, 906.22162647536, 1024.45206885923,
908.693026118345, 923.625774785301, 931.801569764776, 1007.88553380827,
848.55309782664, 927.608364899483, 1024.60765786828, 1085.64295260059,
1057.90632135992, 1195.30607038065, 1151.39888340311, 1168.2831257626,
1137.15375447446, 1145.42393212912, 1108.89072769468, 1075.15451622384,
1129.91711324634, 1191.94330388541, 1132.41649984784, 1210.89342724886,
1100.60339252755, 1083.5987922884, 1056.69487941162, 1150.2707936581,
1055.75678264632, 1055.53323667429, 1049.79655119467, 1166.86598024805,
1141.82593378866, 1066.37755267981, 1160.55793904653, 1162.65728735716,
1060.29360609309, 1107.40480300404, 1825.01445883899, 1802.95011068891,
1692.84948509132, 1675.97166713074, 1758.10341887143, 1788.48414279738,
1680.15824054313, 1756.01930833023, 1706.98458587119, 1770.57687329296,
1692.21991398915, 1835.60585163662, 1790.6487914694, 1787.52076839767,
1704.25313427813, 1735.96312434652, 1813.02044772293, 1847.21159474717,
1725.63580525853, 1841.32016678, 1713.80845602987, 1770.39756152819,
1747.72988313376, 1778.13110060636, 1786.3871288087, 6.01666671271317,
19.2497357431764, 9.6964112500295, -3.23929433528044, 89.4863211231715,
86.0082947221296, 42.7982120490919, 2.19886414532234, 12.8780844043502,
30.694893442471, 7.58386594976601, 83.8385161493349, 36.4551491976192
)), row.names = 100:200, class = "data.frame")
First I create a function to eliminate points which satisfy the conditions.
routliers<-function(x){
if(x>2000|x<0){
rm(x)
}
}
Then I use the apply function across the rows to eliminate the points using the above function (the above dput() is named cds).
cds<-data.frame(apply(cds,1,routliers))
But this eliminates all points
length(cds)
[1]0
Interestingly, if I replace the rm() function with print(), then I do print out the desired points when using the apply function, but I receive the error "arguments imply differing number of rows: 0, 2". Also, I am not sure when I use the apply() function that the specified function applies to both columns of data, as I am not seeing any data points in print() that satisfies the condition for ONLY the second column of points. The first column is x co-ordinates, and the second column is y co-ordinates. I think the error "arguments imply differing number of rows:0,2" suggests that only the first value in the row is being tested against the function.
How can I write code in which rows are eliminated if one or more of the data points satisfies my condition?
This is easy to do when the columns are separate vectors, (x<-x[!condition]) however I cannot add them together again easily so I prefer to do this on a dataframe of points.
Please check if this code works for you, with df being the data you shared:
#Code
new <- df[!rowSums(df < 0 | df>2000) > 0, ]
Or this:
#Code 2
new <- df[which(apply(df,1,function(x) sum(x<0 | x>2000))==0),]
Let's make your function return TRUE for an outlier and FALSE for not an outlier. And it can be vectorized:
is_outlier = function(x) {
x > 2000 | x < 0
}
Here's how we'd use this to drop rows with outliers in a single column:
cds[!is_outlier(cds$xx), ]
For two columns, we can combine the is_outlier results with & or |. I can't tell from your text whether you want to remove rows where xx AND yy are outliers, or remove rows where xx OR yy are outliers. So pick the appropriate version:
cds[!is_outlier(cds$xx) & !is_outlier(cds$yy), ]
cds[!is_outlier(cds$xx) | !is_outlier(cds$yy), ]

R: Replace all Values that are not equal to a set of values

All.
I've been trying to solve a problem on a large data set for some time and could use some of your wisdom.
I have a DF (1.3M obs) with a column called customer along with 30 other columns. Let's say it contains multiple instances of customers Customer1 thru Customer3000. I know that I have issues with 30 of those customers. I need to find all the customers that are NOT the customers I have issues and replace the value in the 'customer' column with the text 'Supported Customer'. That seems like it should be a simple thing...if it werent for the number of obs, I would have loaded it up in Excel, filtered all the bad customers out and copy/pasted the text 'Supported Customer' over what remained.
Ive tried replace and str_replace_all using grepl and paste/paste0 but to no avail. my current code looks like this:
#All the customers that have issues
out <- c("Customer123", "Customer124", "Customer125", "Customer126", "Customer127",
"Customer128", ..... , "Customer140")
#Look for everything that is NOT in the list above and replace with "Enabled"
orderData$customer <- str_replace_all(orderData$customer, paste0("[^", paste(out, collapse =
"|"), "]"), "Enabled Customers")
That code gets me this error:
Error in stri_replace_all_regex(string, pattern, fix_replacement(replacement), :
In a character range [x-y], x is greater than y. (U_REGEX_INVALID_RANGE)
I've tried the inverse of this approach and pulled a list of all obs that dont match the list of out customers. Something like this:
in <- orderData %>% filter(!customer %in% out) %>% select(customer) %>%
distinct(customer)
This gets me a much larger list of customers that ARE enabled (~3,100). Using the str_replace_all and paste approach seems to have issues though. At this large number of patterns, paste no longer collapses using the "|" operator. instead I get a string that looks like:
"c(\"Customer1\", \"Customer2345\", \"Customer54\", ......)
When passed into str_replace_all, this does not match any patterns.
Anyways, there's got to be an easier way to do this. Thanks for any/all help.
Here is a data.table approach.
First, some example data since you didn't provide any.
customer <- sample(paste0("Customer",1:300),5000,replace = TRUE)
orderData <- data.frame(customer = sample(paste0("Customer",1:300),5000,replace = TRUE),stringsAsFactors = FALSE)
orderData <- cbind(orderData,matrix(runif(0,100,n=5000*30),ncol=30))
out <- c("Customer123", "Customer124", "Customer125", "Customer126", "Customer127", "Customer128","Customer140")
library(data.table)
setDT(orderData)
result <- orderData[!(customer %in% out),customer := gsub("Customer","Supported Customer ",customer)]
result
customer 1 2 3 4 5 6 7 8 9
1: Supported Customer 134 65.35091 8.57117 79.594166 84.88867 97.225276 84.563997 17.15166 41.87160 3.717705
2: Supported Customer 225 72.95757 32.80893 27.318046 72.97045 28.698518 60.709381 92.51114 79.90031 7.311200
3: Supported Customer 222 39.55269 89.51003 1.626846 80.66629 9.983814 87.122153 85.80335 91.36377 14.667535
4: Supported Customer 184 24.44624 20.64762 9.555844 74.39480 49.189537 73.126275 94.05833 36.34749 3.091072
5: Supported Customer 194 42.34858 16.08034 34.182737 75.81006 35.167769 23.780069 36.08756 26.46816 31.994756
---

Applying var.test to every row

Given a dataframe where colnames have sample identifiers (i.e. G1 and G2) I want to perform var.test() on each row and store some the results in a new matrix.
>dat
GSM475355_G1 GSM475367_G0 GSM475370_G0 GSM475373_G1 GSM475376_G0 GSM475381_G1 GSM475383_G1 GSM475385_G1
27411 -0.89388704 0.01987934 0.38278532 0.071681020 0.373300080 -0.455644130 -0.18787241 0.65458155
35558 1.74279880 0.54368210 -0.24144077 0.307267200 -0.059902190 -0.052984238 0.13823795 0.20645618
43304 1.94601350 -0.05378771 0.02680111 -0.065221310 0.130765910 -0.090313435 -0.05756617 -0.02083588
33721 -0.29451323 0.01806831 -0.08260250 0.140903470 -0.006454945 -0.128416540 0.05237675 -0.03429079
8310 -0.79846334 1.00792070 -0.35607958 0.378528120 0.081913950 0.112047670 -0.34938622 0.25825214
46204 -3.02495300 0.07315350 0.79066850 0.091570854 0.428258900 0.565763500 0.18908596 0.88739204
21809 0.07164812 0.06946850 0.00000000 -0.005378723 -0.081427574 -0.009929657 0.15938330 -0.05795145
23277 -0.19507313 0.22079802 -0.11173201 -0.139470100 -0.059999466 0.159433840 -0.23357010 -0.02099037
My attempt have been to use apply but it doesn't seem to work, I have tried a few variation which are below for simplicity I have firstly grep the colnames with appropriate identifiers into new dataframes and named them G1 and G2. e.g.
G1<-dat[,grep("G1", colnames(dat))]
The apply attempts:
apply(G1, 1 ,var.test, y = G0)
apply(as.numeric(G1),as.numeric(G0),1, var.test))
And a few other variations with no success. I have even attempted one row e.g(G1[1,]) to test the function.
Thanks to #thelatemail for pointing me in the right direction! It appears for some reason you need to use as.numeric to get this to work.
dat$F.test <- apply(dat, 1, function(x) var.test(as.numeric(x[grep("G1", colnames(dat))]), as.numeric(x[grep("G0", colnames(dat))]))[1])

Resources