R object not found error despite it exists in the data

R object not found error despite it exists in the data - r

I have a data frame in which the variables were calculated by using the unite function in R dyplr. When I load the new data frame, all the variables are there as they should be; however, when trying to further work with it and index it by using the mutate function I get an error saying:! object 'inverted_facing' not found. I tried renaming the variables, but they are not found by the rename function either. All the online advice talks about either forgetting to create the variables or misspelling them in the code, but this is not the case. They are in the data frame and they are spelled correctly in the code. Why are they not found? Can anyone help with fixing this?
Data Frame:
> subj_means_index_Acc
# A tibble: 100 × 7
# Groups: subject, site, category [100]
subject site category `inverted_facing ` inverted_nonFacing `upright _facing ` `upright _nonFacing`
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 P01 EBA "chairs " 0.875 0.969 0.969 1
2 P01 EBA "targets" 0.656 0.75 0.906 1
3 P01 OPA "chairs " 0.969 1 0.906 0.969
4 P01 OPA "targets" 0.469 0.688 0.906 0.969
5 P02 EBA "chairs " 0.906 0.875 0.938 0.906
6 P02 EBA "targets" 0.812 0.812 0.906 0.938
7 P02 OPA "chairs " 0.938 0.781 0.875 0.938
8 P02 OPA "targets" 0.781 0.906 0.875 0.906
9 P03 EBA "chairs " 0.719 0.938 0.906 0.781
10 P03 EBA "targets" 0.938 0.844 0.969 0.938
# … with 90 more rows
Code:
subj_means_index_Acc <- subjmeans_condition %>%
select(subject, site, category, orientation, direction, mAcc) %>%
unite("condition", orientation, direction) %>%
spread(condition, mAcc)
subj_means_index_Acc <- subj_means_index_Acc %>%
mutate(inv_eff_facing = (1-inverted_facing) - (1-upright_facing),
inv_eff_nonfacing =(1-inverted_nonFacing) - (1-upright_nonFacing)) %>%
mutate(inv_eff_fac_nf = inv_eff_facing - inv_eff_nonfacing)

Related

Loop on several variables with the same suffix in R

I have a database which looks like this but with much more rows and columns.
Several variables (x,y,z) measured at different time (1,2,3).
df <-
tibble(
x1 = rnorm(10),
x2 = rnorm(10),
x3 = rnorm(10),
y1 = rnorm(10),
y2 = rnorm(10),
y3 = rnorm(10),
z1 = rnorm(10),
z2 = rnorm(10),
z3 = rnorm(10),
)
I am trying to create dummies variables from the variables with the same suffix (measured at the same time) like this:
df <- df %>%
mutate(var1= ifelse(x1>0 & (y1<0.5 |z1<0.5),0,1)) %>%
mutate(var2= ifelse(x2>0 & (y2<0.5 |z2<0.5),0,1)) %>%
mutate(var3= ifelse(x3>0 & (y1<0.5 |z3<0.5),0,1))
I am used to coding in SAS or Stata, so I would like to use a function or a loop because I have many more variables in my database.
But I think I don't have the right approach in R to deal with this.
Thank you very much for your help !

{dplyover} makes this kind of operation easy (disclaimer: I'm the maintainer), given that your desired output contains a typo:
I think you want to use all variables with the same digit (1, 2, 3 and so on) in each calculation:
df <- df %>%
mutate(var1= ifelse(x1>0 & (y1<0.5 |z1<0.5),0,1)) %>%
mutate(var2= ifelse(x2>0 & (y2<0.5 |z2<0.5),0,1)) %>%
mutate(var3= ifelse(x3>0 & (y3<0.5 |z3<0.5),0,1))
If that is the case we can use dplyover::over to apply the same function over a vector. Here we construct the vector with extract_names("[0-9]{1}$") which gets us all ending numbers of our variable names here: c(1,2,3). We can then construct the variable names using a special syntax: .("x{.x}"). Here .x evaluates to the first number in our vector so it would return the object name x1 (not a string!) which we can use inside the function argument of over.
library(dplyr)
library(dplyover) # Only on GitHub: https://github.com/TimTeaFan/dplyover
df %>%
mutate(over(cut_names("^[a-z]{1}"),
~ ifelse(.("x{.x}") > 0 & (.("y{.x}") < 0.5 | .("z{.x}") < 0.5), 0, 1),
.names = "var{x}"
))
#> # A tibble: 10 x 12
#> x1 x2 x3 y1 y2 y3 z1 z2 z3 var1
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.690 0.550 0.911 0.203 -0.111 0.530 -2.09 0.189 0.147 0
#> 2 -0.238 1.32 -0.145 0.744 1.05 -0.448 2.05 -1.04 1.50 1
#> 3 0.888 0.898 -1.46 -1.87 -1.14 1.59 1.91 -0.155 1.46 0
#> 4 -2.78 -1.34 -0.486 -0.0674 0.246 0.141 0.154 1.08 -0.319 1
#> 5 -1.20 0.835 1.28 -1.32 -0.674 0.115 0.362 1.06 0.515 1
#> 6 0.622 -0.713 0.0525 1.79 -0.427 0.819 -1.53 -0.885 0.00237 0
#> 7 -2.54 0.0197 0.942 0.230 -1.37 -1.02 -1.55 -0.721 -1.06 1
#> 8 -0.434 1.97 -0.274 0.848 -0.482 -0.422 0.197 0.497 -0.600 1
#> 9 -0.316 -0.219 0.467 -1.97 -0.718 -0.442 -1.39 -0.877 1.52 1
#> 10 -1.03 0.226 2.04 0.432 -1.02 -0.535 0.954 -1.11 0.804 1
#> # ... with 2 more variables: var2 <dbl>, var3 <dbl>
Alternatively we can use dplyr::across and use cur_column(), get() and gsub() to alter the name of the column on the fly. To name the new variables correctly we use gsub() in the .names argument of across and wrap it in curly braces {} to evaluate the expression.
library(dplyr)
df %>%
mutate(across(starts_with("x"),
~ {
cur_c <- dplyr::cur_column()
ifelse(.x > 0 & (get(gsub("x","y", cur_c)) < 0.5 | get(gsub("x","z", cur_c)) < 0.5), 0, 1)
},
.names = '{gsub("x", "var", .col)}'
))
#> # A tibble: 10 x 12
#> x1 x2 x3 y1 y2 y3 z1 z2 z3 var1
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -0.423 -1.42 -1.15 -1.54 1.92 -0.511 -0.739 0.501 0.451 1
#> 2 -0.358 0.164 0.971 -1.61 1.96 -0.675 -0.0188 -1.88 1.63 1
#> 3 -0.453 -0.758 -0.258 -0.449 -0.795 -0.362 -1.81 -0.780 -1.90 1
#> 4 0.855 0.335 -1.36 0.796 -0.674 -1.37 -1.42 -1.03 -0.560 0
#> 5 0.436 -0.0487 -0.639 0.352 -0.325 -0.893 -0.746 0.0548 -0.394 0
#> 6 -0.228 -0.240 -0.854 -0.197 0.884 0.118 -0.0713 1.09 -0.0289 1
#> 7 -0.949 -0.231 0.428 0.290 -0.803 2.15 -1.11 -0.202 -1.21 1
#> 8 1.88 -0.0980 -2.60 -1.86 -0.0258 -0.965 -1.52 -0.539 0.108 0
#> 9 0.221 1.58 -1.46 -0.806 0.749 0.506 1.09 0.523 1.86 0
#> 10 0.0238 -0.389 -0.474 0.512 -0.448 0.178 0.529 1.56 -1.12 1
#> # ... with 2 more variables: var2 <dbl>, var3 <dbl>
Created on 2022-06-08 by the reprex package (v2.0.1)

You could restructure your data along the principles of tidy data (see e.g. https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html).
Here to a long format and using tidyverse:
library(tidyverse)
df <-
df |>
pivot_longer(everything()) |>
separate(name, c("var", "time"), sep = "(?=[0-9])") |>
pivot_wider(id_col = "time",
names_from = "var",
names_prefix = "var_",
values_from = "value",
values_fn = list) |>
unnest(-time) |>
mutate(new_var = ifelse(var_x > 0 & (var_y < 0.5 | var_z < 0.5), 0, 1))
df
You would probably want to keep the data in a long format, but if you want, you can pivot_wider and get back to the format you started with. E.g.
df |>
pivot_wider(values_from = c(starts_with("var_"), "new_var"),
names_from = "time",
values_fn = list) |>
unnest(everything())

As you suggested, a solution using a loop is definitely possible.
# times as unique non-alphabetical parts of column names
times <- unique(gsub('[[:alpha:]]', '', names(df)))
for (time in times) {
# column names for current time
xyz <- paste0(c('x', 'y', 'z'), time)
df[[paste0('var', time)]] <-
ifelse(df[[xyz[1]]]>0 & (df[[xyz[2]]]<.5 | df[[xyz[3]]]<.5), 0, 1)
}
Another way I can think of is transforming the data into a 3D array (observartion × variable × time) so that you can actually do the computation for all variables at once.
times <- unique(gsub('[[:alpha:]]', '', names(df)))
df.arr <- sapply(c('x', 'y', 'z'),
function(var) as.matrix(df[, paste0(var, times)]),
simplify='array')
new.vars <- ifelse(df.arr[, , 1]>0 & (df.arr[, , 2]<0.5 | df.arr[, , 3]<0.5), 0, 1)
colnames(new.vars) <- paste0('var', times)
cbind(df, new.vars)
Here, sapply creates a matrix from columns of measurings for each variable at different times and stacks them into a 3D array.
If you trust (or ensure) correct ordering of columns in the data frame, instead of using sapply you can create the array just by modifying the object's dimensions. I didn't do any benchmarking but i guess this could be the most computationally efficient solution (if it should matter).
df.arr <- as.matrix(df)
dim(df.arr) <- c(dim(df.arr) / c(1, 3), 3)

How to flip column variable names to row labels? [duplicate]

This question already has answers here:
Transposing a dataframe maintaining the first column as heading
(5 answers)
Transposition of a Tibble Using Pivot_Longer() and Pivot_Wider (Tidyverse) [duplicate]
(1 answer)
Closed 1 year ago.
I have the below tibble.
A tibble: 2 x 6
Trial_Type CT_tib_all CT_lum_all CT_tho_all CT_gps_all CT_vest_all
* <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Pre 0.244 0.209 0.309 0.315 0.310
2 Post 0.254 0.211 0.302 0.313 0.316
I would like to flip the rows and columns so I end up with a 6 x 2 tibble, but I'm not sure of the easiest way to do this. How do I get the column variable names to become row labels and the row labels as column variables (Pre and Post)?

You can use pivot_longer and pivot_wider -
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -Trial_Type) %>%
pivot_wider(names_from = Trial_Type, values_from = value)
# name Pre Post
# <chr> <dbl> <dbl>
#1 CT_tib_all 0.244 0.254
#2 CT_lum_all 0.209 0.211
#3 CT_tho_all 0.309 0.302
#4 CT_gps_all 0.315 0.313
#5 CT_vest_all 0.31 0.316
In data.table -
library(data.table)
dcast(melt(setDT(df), id.vars = 'Trial_Type'),
variable~Trial_Type, vvalue.var = 'value')

t i.e. transpose function in baseR may also be used, in combination with tibble::rownames_to_column and tibble::column_to_rownames
library(tibble)
library(dplyr)
df <- read.table(text = 'Trial_Type CT_tib_all CT_lum_all CT_tho_all CT_gps_all CT_vest_all
Pre 0.244 0.209 0.309 0.315 0.310
Post 0.254 0.211 0.302 0.313 0.316', header = T)
df %>% tibble::column_to_rownames('Trial_Type') %>%
t() %>% as.data.frame() %>%
rownames_to_column('Trial_Type')
#> Trial_Type Pre Post
#> 1 CT_tib_all 0.244 0.254
#> 2 CT_lum_all 0.209 0.211
#> 3 CT_tho_all 0.309 0.302
#> 4 CT_gps_all 0.315 0.313
#> 5 CT_vest_all 0.310 0.316
Created on 2021-05-28 by the reprex package (v2.0.0)

We can use transpose from data.table
data.table::transpose(df, make.names = 'Trial_Type', keep.names = 'name')
# name Pre Post
#1 CT_tib_all 0.244 0.254
#2 CT_lum_all 0.209 0.211
#3 CT_tho_all 0.309 0.302
#4 CT_gps_all 0.315 0.313#
5 CT_vest_all 0.310 0.316

A base R option using reshape
reshape(
cbind(name = df$Trial_Type, stack(df[-1])),
direction = "wide",
idvar = "ind",
timevar = "name"
)
gives
ind values.Pre values.Post
1 CT_tib_all 0.244 0.254
3 CT_lum_all 0.209 0.211
5 CT_tho_all 0.309 0.302
7 CT_gps_all 0.315 0.313
9 CT_vest_all 0.310 0.316

Odd behavior of dplyr::between and filter

I have a data.frame I want to filter based on whether the range from low to high contains zero. Here's an example
head(toy)
# A tibble: 6 x 3
difference low high
<dbl> <dbl> <dbl>
1 0.0161 -0.143 0.119
2 0.330 0.0678 0.656
3 0.205 -0.103 0.596
4 0.521 0.230 0.977
5 0.328 0.177 0.391
6 -0.0808 -0.367 0.200
I could swear I have used dplyr::between() to do this kind of filtering operation a million times (even with columns of class datetime, where it warns about S3 objects). But I can't find what's wrong with this one.
# Does does not find anything
toy %>%
filter(!dplyr::between(0, low, high))
# Maybe it's because it needs `x` to be a vector, using mutate
# Does not find anything
toy %>%
mutate(zero = 0) %>%
filter(!dplyr::between(zero, low, high))
# if we check the logic, all "keep" go to FALSE
toy %>%
mutate(zero = 0,
keep = !dplyr::between(zero, low, high))
# data.table::between works
toy %>%
filter(!data.table::between(0, low, high))
# regular logic works
toy %>%
filter(low > 0 | high < 0)
The data below:
> dput(toy)
structure(list(difference = c(0.0161058505175378, 0.329976207353122,
0.20517072042705, 0.520837282826481, 0.328289597476641, -0.0807728725339096,
0.660320444135006, 0.310679750033675, -0.743294517440579, -0.00665462977775899,
0.0890903981794149, 0.0643321993757249, 0.157453334405998, 0.107320325893175,
-0.253664041938671, -0.104025850079389, -0.284835573264143, -0.330557762091307,
-0.0300387610595219, 0.081297046765014), low = c(-0.143002432870633,
0.0677907794288728, -0.103344717845837, 0.229753302951895, 0.176601773133456,
-0.366899428200429, 0.403702557199546, 0.0216878391530755, -1.01129163487875,
-0.222395625167488, -0.135193611295608, -0.116654715121314, -0.168581379777843,
-0.281919444558125, -0.605918194917671, -0.364539852350809, -0.500147478407119,
-0.505906196974183, -0.233810558283787, -0.193048952382206),
high = c(0.118860787421672, 0.655558974886329, 0.595905673925067,
0.97748896372657, 0.391043536410999, 0.199727242557477, 0.914173497837859,
0.633804982827898, -0.549942089679123, 0.19745782761473,
0.340823604797603, 0.317956343103116, 0.501279107093568,
0.442497779066522, 0.0721480109893818, 0.280593530192991,
-0.0434862536882377, -0.229723776097642, 0.22550243301984,
0.252686968655449)), row.names = c(NA, -20L), class = c("tbl_df",
"tbl", "data.frame"))
Just in case somebody finds it useful
> "between" %in% conflicts()
[1] FALSE
> packageVersion("dplyr")
[1] ‘1.0.2’

dplyr::between() is not vectorized. One thing you could do is:
df %>%
rowwise() %>%
filter(!dplyr::between(0, low, high))
difference low high
<dbl> <dbl> <dbl>
1 0.330 0.0678 0.656
2 0.521 0.230 0.977
3 0.328 0.177 0.391
4 0.660 0.404 0.914
5 0.311 0.0217 0.634
6 -0.743 -1.01 -0.550
7 -0.285 -0.500 -0.0435
8 -0.331 -0.506 -0.230
data.table::between() is vectorized: that's the reason why it works.

We could use map2
library(dplyr)
library(purrr)
toy %>%
filter(!map2_lgl(low, high, ~ between(0, .x, .y)))
-output
# A tibble: 8 x 3
difference low high
<dbl> <dbl> <dbl>
1 0.330 0.0678 0.656
2 0.521 0.230 0.977
3 0.328 0.177 0.391
4 0.660 0.404 0.914
5 0.311 0.0217 0.634
6 -0.743 -1.01 -0.550
7 -0.285 -0.500 -0.0435
8 -0.331 -0.506 -0.230

R Markdown 2 columns should have the same names

I have a table, where I've performed two statistical tests. So I've received the statistic and the p.value two times. R added after the first case a ".x" and after the second case a ".y" because it isn't possible that two colums have the same names in R.
Now I want to insert my dataframe in R Markdown and convert it to a pdf file. Is there a way to reshape the table so that the names of both columns are the same?
Here is my current table:
# A tibble: 6 x 4
statistic.x p.value.x statistic.y p.value.y
<dbl> <chr> <dbl> <chr>
1 0.533 0.595 115806 0.791
2 0.276 0.783 60380 0.674
3 -0.481 0.633 28392 0.116
4 2.68 0.008 * * 94507 0.195
5 1.95 0.054 56902 0.349
And I want to have this table in R Markdown:
# A tibble: 6 x 4
statistic p.value statistic p.value
<dbl> <chr> <dbl> <chr>
1 0.533 0.595 115806 0.791
2 0.276 0.783 60380 0.674
3 -0.481 0.633 28392 0.116
4 2.68 0.008 * * 94507 0.195
5 1.95 0.054 56902 0.349
Here is the code for my data:
structure(list(statistic.x = c(0.533, 0.276, -0.481, 2.678, 1.95,
1.996), p.value.x = c("0.595", "0.783", "0.633", "0.008 * *",
"0.054", "0.051"), statistic.y = c(115806, 60380, 28392, 94507,
56902, 37688), p.value.y = c("0.791", "0.674", "0.116", "0.195",
"0.349", "0.397")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L))

I didn't have any problem setting names using colnames
> colnames(dat) <- c( "statistic", "p.value", "statistic", "p.value" )
> dat
statistic p.value statistic p.value
1 0.533 0.595 115806 0.791
2 0.276 0.783 60380 0.674
3 -0.481 0.633 28392 0.116
4 2.678 0.008 * * 94507 0.195
5 1.950 0.054 56902 0.349
6 1.996 0.051 37688 0.397

Adding standardized variables to a data frame using dplyr and a for loop

I have a two-part problem. I've searched all over stack and found answers related to my problems, but no variations I've tried have worked yet. Thanks in advance for any help!
I have a large data frame that contains many variables.
First, I want to (1) standardize a variable by another variable (in my case, speaker), and (2) filter out values after the variable has been standardized (greater than 2 standard deviations away from the mean). (1) and (2) can be taken care of by a function using dplyr.
Second, I have many variables I want to do this for, so I'm trying to find an automated way to do this, such as with a for loop.
Problem 1: Writing a function containing dplyr functions
Here is a sample of what my data frame looks like:
df = data.frame(speaker=c("eng1","eng1","eng1","eng1","eng1","eng1","eng2","eng2","eng2","eng2","eng2"),
ratio_means001=c(0.56,0.202,0.695,0.436,0.342,10.1,0.257,0.123,0.432,0.496,0.832),
ratio_means002=c(0.66,0.203,0.943,0.432,0.345,0.439,0.154,0.234,NA,0.932,0.854))
Output:
speaker ratio_means001 ratio_means002
1 eng1 0.560 0.660
2 eng1 0.202 0.203
3 eng1 0.695 0.943
4 eng1 0.436 0.432
5 eng1 0.342 0.345
6 eng1 10.100 0.439
7 eng2 0.257 0.154
8 eng2 0.123 0.234
9 eng2 0.432 NA
10 eng2 0.496 0.932
11 eng2 0.832 0.854
Below is the basic code I want to turn into a function:
standardized_data = group_by(df, speaker) %>%
mutate(zRatio1 = as.numeric(scale(ratio_means001)))%>%
filter(!abs(zRatio1) > 2)
So that the data frame will now look like this (for example):
speaker ratio_means001 ratio_means002 zRatio1
(fctr) (dbl) (dbl) (dbl)
1 eng1 0.560 0.660 -0.3792191
2 eng1 0.202 0.203 -0.4699781
3 eng1 0.695 0.943 -0.3449943
4 eng1 0.436 0.432 -0.4106552
5 eng1 0.342 0.345 -0.4344858
6 eng2 0.257 0.154 -0.6349445
7 eng2 0.123 0.234 -1.1325034
8 eng2 0.432 NA 0.0148525
9 eng2 0.496 0.932 0.2524926
10 eng2 0.832 0.854 1.5001028
Here is what I have in terms of a function so far. The mutate part works, but I've been struggling with adding the filter part:
library(lazyeval)
standardize_variable = function(col1, new_col_name) {
mutate_call = lazyeval::interp(b = interp(~ scale(a)), a = as.name(col1))
group_by(data,speaker) %>%
mutate_(.dots = setNames(list(mutate_call), new_col_name)) %>%
filter_(interp(~ !abs(b) > 2.5, b = as.name(new_col_name))) # this part does not work
}
I receive the following error when I try to run the function:
data = standardize_variable("ratio_means001","zRatio1")
Error in substitute_(`_obj`[[2]], values) :
argument "_obj" is missing, with no default
Problem 2: Looping over the function
There are many variables that I'd like to apply the above function to, so I would like to find a way to either use a loop or another helpful function to help automate this process. The variable names differ only in a number at the end, so I have come up with something like this:
d <- data.frame()
for(i in 1:2)
{
col1 <- paste("ratio_means00", i, sep = "")
new_col <- paste("zRatio", i, sep = "")
d <- rbind(d, standardize_variable(col1, new_col))
}
However, I get the following error:
Error in match.names(clabs, names(xi)) :
names do not match previous names
Thanks again for any help on these issues!

Alternative 1
I believe the main problem you were having with your function had to do with you calling interp twice. Fixing that led to an additional problem with filter, which I think was due to scale adding attributes (I'm using a development version dplyr, dplyr_0.4.3.9001). Wrapping as.numeric around scale gets rid of that.
So with the fixes your function looks like:
standardize_variable = function(col1, new_col_name) {
mutate_call = lazyeval::interp(~as.numeric(scale(a)), a = as.name(col1))
group_by(df, speaker) %>%
mutate_(.dots = setNames(list(mutate_call), new_col_name)) %>%
filter_(interp(~ !abs(b) > 2, b = as.name(new_col_name)))
}
I found the loop through the variables to be a bit more complicated than what you had, as I believe you want to merge your datasets back together once you make one for each variable. One option is to save them to a list and then use do.call with merge to get the final dataset.
d = list()
for(i in 1:2) {
col1 <- paste("ratio_means00", i, sep = "")
new_col <- paste("zRatio", i, sep = "")
d[[i]] = standardize_variable(col1, new_col)
}
do.call(merge, d)
speaker ratio_means001 ratio_means002 zRatio1 zRatio2
1 eng1 0.202 0.203 -0.4699781 -1.1490444
2 eng1 0.342 0.345 -0.4344858 -0.6063693
3 eng1 0.436 0.432 -0.4106552 -0.2738853
4 eng1 0.560 0.660 -0.3792191 0.5974521
5 eng1 0.695 0.943 -0.3449943 1.6789806
6 eng2 0.123 0.234 -1.1325034 -0.7620572
7 eng2 0.257 0.154 -0.6349445 -0.9590348
8 eng2 0.496 0.932 0.2524926 0.9565726
9 eng2 0.832 0.854 1.5001028 0.7645194
Alternative 2
An alternative to all of this would be to use mutate_each and rename_ for the first part of the problem and then use an interp with a lapply loop for the final filtering of all of the scaled variables simultaneously.
In the code below I take advantage of the fact that mutate_each allows naming for single functions starting in dplyr_0.4.3.9001. Things look a bit complicated in rename_ because I was making the names you wanted for the new columns. To simplify things you could leave them ending in _z from mutate_each and save yourself the complicated step of rename_ with gsub and grepl.
df2 = df %>%
group_by(speaker) %>%
mutate_each(funs(z = as.numeric(scale(.))), starts_with("ratio_means00")) %>%
rename_(.dots = setNames(names(.)[grepl("z", names(.))],
paste0("zR", gsub("r|_z|_means00", "", names(.)[grepl("z", names(.))]))))
Once that's done, you just need to filter by multiple columns. I think it's easiest to make a list of conditions you want to filter with use interp and lapply, and then give that to the .dots argument of filter_.
dots = lapply(names(df2)[starts_with("z", vars = names(df2))],
function(y) interp(~abs(x) < 2, x = as.name(y)))
filter_(df2, .dots = dots)
Source: local data frame [9 x 5]
Groups: speaker [2]
speaker ratio_means001 ratio_means002 zRatio1 zRatio2
(fctr) (dbl) (dbl) (dbl) (dbl)
1 eng1 0.560 0.660 -0.3792191 0.5974521
2 eng1 0.202 0.203 -0.4699781 -1.1490444
3 eng1 0.695 0.943 -0.3449943 1.6789806
4 eng1 0.436 0.432 -0.4106552 -0.2738853
5 eng1 0.342 0.345 -0.4344858 -0.6063693
6 eng2 0.257 0.154 -0.6349445 -0.9590348
7 eng2 0.123 0.234 -1.1325034 -0.7620572
8 eng2 0.496 0.932 0.2524926 0.9565726
9 eng2 0.832 0.854 1.5001028 0.7645194
Alternative 3
I often find these problems most straightforward if I reshape the dataset instead of working across columns. For example, still using the newest version of mutate_each but skipping the renaming step for simplicity you could gather all the standardized columns together using the gather function from tidyr and then filter the new column.
library(tidyr)
df %>%
group_by(speaker) %>%
mutate_each(funs(z = as.numeric(scale(.))), starts_with("ratio_means00")) %>%
gather(group, zval, ends_with("_z")) %>%
filter(abs(zval) <2 )
# First 12 lines of output
Source: local data frame [20 x 5]
Groups: speaker [2]
speaker ratio_means001 ratio_means002 group zval
<fctr> <dbl> <dbl> <chr> <dbl>
1 eng1 0.560 0.660 ratio_means001_z -0.3792191
2 eng1 0.202 0.203 ratio_means001_z -0.4699781
3 eng1 0.695 0.943 ratio_means001_z -0.3449943
4 eng1 0.436 0.432 ratio_means001_z -0.4106552
5 eng1 0.342 0.345 ratio_means001_z -0.4344858
6 eng2 0.257 0.154 ratio_means001_z -0.6349445
7 eng2 0.123 0.234 ratio_means001_z -1.1325034
8 eng2 0.432 NA ratio_means001_z 0.0148525
9 eng2 0.496 0.932 ratio_means001_z 0.2524926
10 eng2 0.832 0.854 ratio_means001_z 1.5001028
11 eng1 0.560 0.660 ratio_means002_z 0.5974521
12 eng1 0.202 0.203 ratio_means002_z -1.1490444
...
If the desired final form is the wide format, you can use spread (also from tidyr for that. One advantage (to me) is that you can keep all values of one variable even when another variable failed the filtering step.
df %>%
group_by(speaker) %>%
mutate_each(funs(z = as.numeric(scale(.))), starts_with("ratio_means00")) %>%
gather(group, zval, ends_with("_z")) %>%
filter(abs(zval) <2 ) %>%
spread(group, zval)
Source: local data frame [11 x 5]
Groups: speaker [2]
speaker ratio_means001 ratio_means002 ratio_means001_z ratio_means002_z
<fctr> <dbl> <dbl> <dbl> <dbl>
1 eng1 0.202 0.203 -0.4699781 -1.1490444
2 eng1 0.342 0.345 -0.4344858 -0.6063693
3 eng1 0.436 0.432 -0.4106552 -0.2738853
4 eng1 0.560 0.660 -0.3792191 0.5974521
5 eng1 0.695 0.943 -0.3449943 1.6789806
6 eng1 10.100 0.439 NA -0.2471337
7 eng2 0.123 0.234 -1.1325034 -0.7620572
8 eng2 0.257 0.154 -0.6349445 -0.9590348
9 eng2 0.432 NA 0.0148525 NA
10 eng2 0.496 0.932 0.2524926 0.9565726
11 eng2 0.832 0.854 1.5001028 0.7645194
If you don't want to keep the NA, you can always na.omit them at a later time.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R object not found error despite it exists in the data - r

Related

Loop on several variables with the same suffix in R

How to flip column variable names to row labels? [duplicate]

Odd behavior of dplyr::between and filter

R Markdown 2 columns should have the same names

Adding standardized variables to a data frame using dplyr and a for loop

Categories

Resources