Within column subtraction iteratively in R

Within column subtraction iteratively in R - r

I have the following data:
library tidyverse
age_grp <- c(10,9,8,7,6,5,4,3,2,1)
start <- c(0.420,0.420,0.420,0.420,0.420,0.420,0.420,0.420,0.420,0.420)
change <- c(0.020,0.033,0.029,0.031,0.027,0.032,0.032,0.030,0.027,0.034)
final_outcome <- c(0.400,0.367,0.338,0.307,0.28,0.248,0.216,0.186,0.159,0.125)
my_data <- data.frame(age_grp,start,change,final_outcome)
my_data1 <- my_data %>%
dplyr::arrange(age_grp)
I would like to subtract the values in the variable change from the values in the variable start such that it is an iterative decrease from the oldest age group to the youngest. The final values that I am looking to get are in the variable final_outcome. For example, starting with age_grp 10, I want to subtract 0.20 from 0.420 to get 0.400. Then, I would like to subtract 0.033 from 0.400 to get 0.367 and so forth. I am struggling with how to store those differences. I have made an attempt below, but I don't know how to store the difference to then continue the subtraction forward (or backward, depending on how you look at it). Any advice or suggestions would be appreciated.
my_data1$attempt <- NA
#calculating the decreases
for(i in 2:nrow(my_data1))
my_data1$attempt[i] <- round(my_data1$start[i] - my_data1$change[i-1], 4)

If we need the same output as in final_outcome
library(dplyr)
my_data %>%
mutate(attempt = start - cumsum(change)) %>%
arrange(age_grp)
-output
# age_grp start change final_outcome attempt
#1 1 0.42 0.034 0.125 0.125
#2 2 0.42 0.027 0.159 0.159
#3 3 0.42 0.030 0.186 0.186
#4 4 0.42 0.032 0.216 0.216
#5 5 0.42 0.032 0.248 0.248
#6 6 0.42 0.027 0.280 0.280
#7 7 0.42 0.031 0.307 0.307
#8 8 0.42 0.029 0.338 0.338
#9 9 0.42 0.033 0.367 0.367
#10 10 0.42 0.020 0.400 0.400

my_data$final <- my_data$start - cumsum(my_data$change)

library(tidyverse)
my_data %>%
mutate(attempt = accumulate(change, ~ .x - .y, .init = start[1])[-1])
Note: accumulate is from the purrr library that's part of the tidyverse. It also has a .dir argument where you can go "forward" or "backward".
Or in base R using Reduce:
within(my_data, attempt <- Reduce("-", change, init = start[1], accumulate = T)[-1])
Reduce has an argument right that can also do the computation forwards or backwards.
Output
age_grp start change final_outcome attempt
1 10 0.42 0.020 0.400 0.400
2 9 0.42 0.033 0.367 0.367
3 8 0.42 0.029 0.338 0.338
4 7 0.42 0.031 0.307 0.307
5 6 0.42 0.027 0.280 0.280
6 5 0.42 0.032 0.248 0.248
7 4 0.42 0.032 0.216 0.216
8 3 0.42 0.030 0.186 0.186
9 2 0.42 0.027 0.159 0.159
10 1 0.42 0.034 0.125 0.125

Related

Issue in extracting a specific column from psych::alpha output

library(mirt) #this contains a dataset called deAyala.
library(psych) #this contains the alpha() function.
alpha(deAyala)
Using this function gives me the following dataset:
Some items ( Item.4 Item.5 ) were negatively correlated with the total scale and
probably should be reversed.
To do this, run the function again with the 'check.keys=TRUE' option
Reliability analysis
Call: alpha(x = deAyala)
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd
0.00097 0.21 0.32 0.042 0.27 0.00057 103 134
median_r
0
lower alpha upper 95% confidence boundaries
0 0 0
Reliability if an item is dropped:
raw_alpha std.alpha G6(smc) average_r S/N
Item.1 5.9e-05 0.019 0.057 0.0038 0.019
Item.2 6.5e-04 0.178 0.293 0.0414 0.216
Item.3 8.4e-04 0.220 0.342 0.0535 0.283
Item.4 1.2e-03 0.289 0.385 0.0750 0.406
Item.5 1.3e-03 0.306 0.387 0.0812 0.442
Frequency 0.0e+00 0.000 0.000 0.0000 0.000
alpha se var.r med.r
Item.1 0.00056 0.011 0
Item.2 0.00055 0.044 0
Item.3 0.00054 0.047 0
Item.4 0.00052 0.044 0
Item.5 0.00051 0.041 0
Frequency 0.27951 0.000 0
Item statistics
n raw.r std.r r.cor r.drop mean sd
Item.1 32 0.60 0.59 0.657 0.60 0.5 0.51
Item.2 32 0.22 0.45 0.202 0.22 0.5 0.51
Item.3 32 0.10 0.41 0.080 0.10 0.5 0.51
Item.4 32 -0.11 0.33 -0.059 -0.11 0.5 0.51
Item.5 32 -0.17 0.31 -0.079 -0.17 0.5 0.51
Frequency 32 1.00 0.61 0.722 0.29 612.5 804.25
Non missing response frequency for each item
0 1 miss
Item.1 0.5 0.5 0
Item.2 0.5 0.5 0
Item.3 0.5 0.5 0
Item.4 0.5 0.5 0
Item.5 0.5 0.5 0
I ONLY want the raw.r column ([1:6] 0.6 0.223 0.103 -0.112 -0.174) in item statistics table, and I want to store them in a variable. How can I do that? I tried the following:
str(alpha(deAyala)$item.stats$raw.r)
But this gives me a lot of text:
Some items ( Item.4 Item.5 ) were negatively correlated with the total scale and
probably should be reversed.
To do this, run the function again with the 'check.keys=TRUE' option num [1:6] 0.6 0.223 0.103 -0.112 -0.174 ...
Warning message:
In alpha(deAyala) :
Some items were negatively correlated with the total scale and probably
should be reversed.
To do this, run the function again with the 'check.keys=TRUE' option

You can get rid of the warning bit by wrapping it with suppressWarnings() but the first bit of the message looks like just a print statement in the alpha() function. This will work though
invisible(capture.output(x <- suppressWarnings(alpha(deAyala)$item.stats$raw.r)))
EDIT: Actually I just looked at the help for alpha and it has a warnings option you can just set to FALSE.

Pre-defining number format of a data frame in R

Is there any way to pre-define a number format (e.g. rounding off to the specified number of decimal places) of a data frame so that whenever I add a new column it follows the same format?
I tried with format {base}, but it only changes the format of the existing columns not for the ones I add after.
A workable example is given below
mydf <- as.data.frame(matrix(rnorm(50), ncol=5))
mydf
V1 V2 V3 V4 V5
1 -1.3088022 -0.22088032 -1.8739405 1.65276442 1.21762297
2 1.1123253 -0.76042101 -0.1608188 0.39945804 -0.58674209
3 -0.9366654 0.92893610 -0.6905299 -0.37374892 -1.70539909
4 0.4619175 -0.28929198 1.0280021 -0.87998207 -0.34493824
5 -0.3741670 -0.61782368 -1.0435906 0.52166082 -0.29308408
6 -1.2283031 -0.37065379 0.8652538 0.05088202 -1.80997313
7 -1.1137726 -0.97878307 0.5045051 0.85442196 0.02932812
8 0.3373866 -0.46614754 -0.4642278 -0.38438002 -1.47251777
9 0.3245720 -0.06047061 -0.3273080 0.49145133 -0.86507348
10 1.6459180 -1.31076464 1.5627246 0.49841764 0.73895626
the following changes the format of the data frame
mydf <- format(mydf, digits=2)
mydf
V1 V2 V3 V4 V5
1 -1.31 -0.22 -1.87 1.653 1.218
2 1.11 -0.76 -0.16 0.399 -0.587
3 -0.94 0.93 -0.69 -0.374 -1.705
4 0.46 -0.29 1.03 -0.880 -0.345
5 -0.37 -0.62 -1.04 0.522 -0.293
6 -1.23 -0.37 0.87 0.051 -1.810
7 -1.11 -0.98 0.50 0.854 0.029
8 0.34 -0.47 -0.46 -0.384 -1.473
9 0.32 -0.06 -0.33 0.491 -0.865
10 1.65 -1.31 1.56 0.498 0.739
but this formatting is not applied when I add a new column to the data frame, see below
mydf$new <- rnorm(10)
mydf
V1 V2 V3 V4 V5 new
1 -1.31 -0.22 -1.87 1.653 1.218 0.30525117
2 1.11 -0.76 -0.16 0.399 -0.587 -1.83038790
3 -0.94 0.93 -0.69 -0.374 -1.705 0.34830499
4 0.46 -0.29 1.03 -0.880 -0.345 -0.66017888
5 -0.37 -0.62 -1.04 0.522 -0.293 0.03103741
6 -1.23 -0.37 0.87 0.051 -1.810 1.32809006
7 -1.11 -0.98 0.50 0.854 0.029 0.85428977
8 0.34 -0.47 -0.46 -0.384 -1.473 -0.51917266
9 0.32 -0.06 -0.33 0.491 -0.865 -0.37057104
10 1.65 -1.31 1.56 0.498 0.739 -1.32447706
I know I can adjust the digits using print {base}, but that also does not change the underlying format of the data frame. Any suggestion? Thanks in advance.

Boxplot factor across many samples with R

Given the data
Step A B C D E F G I J
1 1 0.158 0.011 0.099 6.504 5.914 0.000 0.100 0.330 0.000
2 2 0.345 0.016 0.102 6.050 5.285 0.000 0.102 0.316 0.001
3 1 0.324 0.015 0.100 7.146 6.426 0.000 0.101 0.293 0.000
4 2 0.264 0.015 0.099 5.864 5.202 0.000 0.101 0.296 0.000
5 1 0.346 0.022 0.101 5.889 5.027 0.000 0.101 0.411 0.000
6 2 0.397 0.022 0.130 6.061 5.311 0.000 0.131 0.220 0.000
7 1 0.337 0.015 0.048 7.417 6.839 0.000 0.110 0.129 0.000
8 2 0.362 0.016 0.143 5.726 4.951 0.001 0.144 0.268 0.000
9 1 0.178 0.011 0.099 5.831 5.290 0.000 0.100 0.261 0.000
d < - read.table('sample.txt', header=T) gives me a data frame, and boxplot(d$A ~ d$Step) yields a reasonable graph, but I cannot seem to get all plots on the same graph. Something like boxplot(d ~ d$Step) is what I expected to work, but I get the following error:
Error in model.frame.default(formula = d ~ d$Step) :
invalid type (list) for variable 'd'
I've tried making Step a factor d$Step <- as.factor(d$Step) but that seems to have no effect.

An alternative is to plot these in base R each on their own scale, like this
par(mfrow=c(3,3))
for(i in 2:10) {
boxplot(d[,i] ~ d$Step, main=names(d)[i]) }

We can do this with tidyverse
library(tidyverse)
gather(d, Var, Val, -Step) %>%
mutate(Step=factor(Step)) %>%
ggplot(., aes(x=Var, y = Val, fill=Step)) +
geom_boxplot() +
scale_fill_manual(values = c("red", "blue"))

Extract rows from a matrix based of values from another matrix

I need your help!
I am trying to pull out rows of the second matrix based on IDs from the first matrix. To check that my function (which is not provided here) works correctly, I run the following code (CritMat is the second matrix and parms is the first):
results <- matrix(0, nrow = 15, ncol = 8)
colnames(results) <- c("alpha", "beta", "omega", "T=64", "T=128", "T=256", "T=512", "T=1024")
for (r in 1:15) {
results [r,] <- CritMat[CritMat[, 1] == parms[r, 2] & CritMat[, 2] ==
parms[r, 1] & CritMat[, 3] == parms[r, 3] , ]
print(results[r,])
}
The loop works for the first 4 iterations followed by the following error message for the fifth:
*Error in results[r, ] <- CritMat[CritMat[, 1] == parms[r, 2] & CritMat[, :
replacement has length zero*
Any idea why this happens and solution.
Many thanks
AA
****parms matrix****
beta alpha omega
1 0.005 0.005 0.990
2 0.240 0.005 0.755
3 0.490 0.005 0.505
4 0.740 0.005 0.255
5 0.990 0.005 0.005
6 0.005 0.250 0.745
7 0.240 0.250 0.510
8 0.490 0.250 0.260
9 0.740 0.250 0.010
10 0.005 0.500 0.495
11 0.240 0.500 0.260
12 0.490 0.500 0.010
13 0.005 0.750 0.245
14 0.240 0.750 0.010
15 0.005 0.990 0.005
****CritMat matrix****
alpha beta omega T.64 T.128 T.256 T.512 T.1024
1 0.005 0.005 0.990 -2.956420 -2.919654 -2.921704 -2.886429 -2.879443
2 0.005 0.240 0.755 -2.959242 -2.917744 -2.923356 -2.885018 -2.881905
3 0.005 0.490 0.505 -2.959395 -2.915798 -2.927405 -2.886637 -2.885186
4 0.005 0.740 0.255 -2.957763 -2.912088 -2.934518 -2.890182 -2.889484
5 0.005 0.990 0.005 -2.937999 -2.857668 -2.864637 -2.819950 -2.820588
6 0.250 0.005 0.745 -2.987160 -2.986864 -2.897846 -2.865875 -2.911572
7 0.250 0.240 0.510 -3.034868 -2.979375 -2.924888 -2.875446 -2.898752
8 0.250 0.490 0.260 -3.052279 -2.995942 -2.969414 -2.926178 -2.918958
9 0.250 0.740 0.010 -3.197169 -3.263336 -3.258011 -3.202253 -3.248068
10 0.500 0.005 0.495 -3.031267 -3.038585 -2.936348 -2.921126 -2.908868
11 0.500 0.240 0.260 -3.142031 -3.086536 -3.026555 -3.079825 -2.871080
12 0.500 0.490 0.010 -3.383052 -3.410789 -3.431221 -3.367462 -3.332024
13 0.750 0.005 0.245 -3.209441 -3.170385 -3.112472 -3.141569 -2.925559
14 0.750 0.240 0.010 -3.452131 -3.517234 -3.428402 -3.477691 -3.178128
15 0.990 0.005 0.005 -3.427804 -3.491805 -3.298037 -3.290127 -3.087541

Evaluating a matrix by row for a condition being met in R

I've got data in the following format.
P10_neg._qn P11_neg._qn P12_neg._qn P14_neg._qn P17_neg._qn P24_neg._qn P25_neg._qn
1 -0.025 -0.037 -0.032 -0.061 -0.176 0.033 -0.011
2 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
3 0.033 -0.127 0.042 0.014 0.097 0.105 0.048
4 0.033 -0.127 0.042 0.014 0.097 0.105 0.048
5 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
6 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
What is the best way by which I can check, for every row, how many entries are greater than 0.1, for instance and return a vector of counts?

You can use the rowSum function for this task. Assuming that dat is you matrix then :
rowSum(dat > 0.1)
Using the sample data provided we have :
dat <- read.table(text = ' P10_neg._qn P11_neg._qn P12_neg._qn P14_neg._qn P17_neg._qn P24_neg._qn P25_neg._qn
1 -0.025 -0.037 -0.032 -0.061 -0.176 0.033 -0.011
2 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
3 0.033 -0.127 0.042 0.014 0.097 0.105 0.048
4 0.033 -0.127 0.042 0.014 0.097 0.105 0.048
5 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
6 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087',
row.names = 1, header = TRUE)
rowSums(dat > 0.1)
## 1 2 3 4 5 6
## 0 1 1 1 1 1

apply(dat, 1, function(x) sum(x>.1))
# [1] 0 1 1 1 1 1

here an Rcpp version:
// [[Rcpp::export]]
IntegerVector countGreaterThan2(NumericMatrix M,double val) {
IntegerVector res;
for (int i=0; i<M.nrow(); i++) {
NumericVector row = M( i, _);
double num = std::count_if(row.begin(), row.end(),
[&val](const double& x) -> bool {return x>val;});
res.push_back(num);
}
return res;
}
But rowSum is unbeatable:
system.time(rowSums(dfx>0.2))
user system elapsed
0.01 0.00 0.02
> system.time(countGreaterThan2(dfx,0.2))
user system elapsed
0.06 0.00 0.06

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Within column subtraction iteratively in R - r

my_data$final <- my_data$start - cumsum(my_data$change)

Related

Issue in extracting a specific column from psych::alpha output

Pre-defining number format of a data frame in R

Boxplot factor across many samples with R

Extract rows from a matrix based of values from another matrix

Evaluating a matrix by row for a condition being met in R

Categories

Resources