I have the following data:
library tidyverse
age_grp <- c(10,9,8,7,6,5,4,3,2,1)
start <- c(0.420,0.420,0.420,0.420,0.420,0.420,0.420,0.420,0.420,0.420)
change <- c(0.020,0.033,0.029,0.031,0.027,0.032,0.032,0.030,0.027,0.034)
final_outcome <- c(0.400,0.367,0.338,0.307,0.28,0.248,0.216,0.186,0.159,0.125)
my_data <- data.frame(age_grp,start,change,final_outcome)
my_data1 <- my_data %>%
dplyr::arrange(age_grp)
I would like to subtract the values in the variable change from the values in the variable start such that it is an iterative decrease from the oldest age group to the youngest. The final values that I am looking to get are in the variable final_outcome. For example, starting with age_grp 10, I want to subtract 0.20 from 0.420 to get 0.400. Then, I would like to subtract 0.033 from 0.400 to get 0.367 and so forth. I am struggling with how to store those differences. I have made an attempt below, but I don't know how to store the difference to then continue the subtraction forward (or backward, depending on how you look at it). Any advice or suggestions would be appreciated.
my_data1$attempt <- NA
#calculating the decreases
for(i in 2:nrow(my_data1))
my_data1$attempt[i] <- round(my_data1$start[i] - my_data1$change[i-1], 4)
If we need the same output as in final_outcome
library(dplyr)
my_data %>%
mutate(attempt = start - cumsum(change)) %>%
arrange(age_grp)
-output
# age_grp start change final_outcome attempt
#1 1 0.42 0.034 0.125 0.125
#2 2 0.42 0.027 0.159 0.159
#3 3 0.42 0.030 0.186 0.186
#4 4 0.42 0.032 0.216 0.216
#5 5 0.42 0.032 0.248 0.248
#6 6 0.42 0.027 0.280 0.280
#7 7 0.42 0.031 0.307 0.307
#8 8 0.42 0.029 0.338 0.338
#9 9 0.42 0.033 0.367 0.367
#10 10 0.42 0.020 0.400 0.400
my_data$final <- my_data$start - cumsum(my_data$change)
library(tidyverse)
my_data %>%
mutate(attempt = accumulate(change, ~ .x - .y, .init = start[1])[-1])
Note: accumulate is from the purrr library that's part of the tidyverse. It also has a .dir argument where you can go "forward" or "backward".
Or in base R using Reduce:
within(my_data, attempt <- Reduce("-", change, init = start[1], accumulate = T)[-1])
Reduce has an argument right that can also do the computation forwards or backwards.
Output
age_grp start change final_outcome attempt
1 10 0.42 0.020 0.400 0.400
2 9 0.42 0.033 0.367 0.367
3 8 0.42 0.029 0.338 0.338
4 7 0.42 0.031 0.307 0.307
5 6 0.42 0.027 0.280 0.280
6 5 0.42 0.032 0.248 0.248
7 4 0.42 0.032 0.216 0.216
8 3 0.42 0.030 0.186 0.186
9 2 0.42 0.027 0.159 0.159
10 1 0.42 0.034 0.125 0.125
Related
library(mirt) #this contains a dataset called deAyala.
library(psych) #this contains the alpha() function.
alpha(deAyala)
Using this function gives me the following dataset:
Some items ( Item.4 Item.5 ) were negatively correlated with the total scale and
probably should be reversed.
To do this, run the function again with the 'check.keys=TRUE' option
Reliability analysis
Call: alpha(x = deAyala)
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd
0.00097 0.21 0.32 0.042 0.27 0.00057 103 134
median_r
0
lower alpha upper 95% confidence boundaries
0 0 0
Reliability if an item is dropped:
raw_alpha std.alpha G6(smc) average_r S/N
Item.1 5.9e-05 0.019 0.057 0.0038 0.019
Item.2 6.5e-04 0.178 0.293 0.0414 0.216
Item.3 8.4e-04 0.220 0.342 0.0535 0.283
Item.4 1.2e-03 0.289 0.385 0.0750 0.406
Item.5 1.3e-03 0.306 0.387 0.0812 0.442
Frequency 0.0e+00 0.000 0.000 0.0000 0.000
alpha se var.r med.r
Item.1 0.00056 0.011 0
Item.2 0.00055 0.044 0
Item.3 0.00054 0.047 0
Item.4 0.00052 0.044 0
Item.5 0.00051 0.041 0
Frequency 0.27951 0.000 0
Item statistics
n raw.r std.r r.cor r.drop mean sd
Item.1 32 0.60 0.59 0.657 0.60 0.5 0.51
Item.2 32 0.22 0.45 0.202 0.22 0.5 0.51
Item.3 32 0.10 0.41 0.080 0.10 0.5 0.51
Item.4 32 -0.11 0.33 -0.059 -0.11 0.5 0.51
Item.5 32 -0.17 0.31 -0.079 -0.17 0.5 0.51
Frequency 32 1.00 0.61 0.722 0.29 612.5 804.25
Non missing response frequency for each item
0 1 miss
Item.1 0.5 0.5 0
Item.2 0.5 0.5 0
Item.3 0.5 0.5 0
Item.4 0.5 0.5 0
Item.5 0.5 0.5 0
I ONLY want the raw.r column ([1:6] 0.6 0.223 0.103 -0.112 -0.174) in item statistics table, and I want to store them in a variable. How can I do that? I tried the following:
str(alpha(deAyala)$item.stats$raw.r)
But this gives me a lot of text:
Some items ( Item.4 Item.5 ) were negatively correlated with the total scale and
probably should be reversed.
To do this, run the function again with the 'check.keys=TRUE' option num [1:6] 0.6 0.223 0.103 -0.112 -0.174 ...
Warning message:
In alpha(deAyala) :
Some items were negatively correlated with the total scale and probably
should be reversed.
To do this, run the function again with the 'check.keys=TRUE' option
You can get rid of the warning bit by wrapping it with suppressWarnings() but the first bit of the message looks like just a print statement in the alpha() function. This will work though
invisible(capture.output(x <- suppressWarnings(alpha(deAyala)$item.stats$raw.r)))
EDIT: Actually I just looked at the help for alpha and it has a warnings option you can just set to FALSE.
Is there any way to pre-define a number format (e.g. rounding off to the specified number of decimal places) of a data frame so that whenever I add a new column it follows the same format?
I tried with format {base}, but it only changes the format of the existing columns not for the ones I add after.
A workable example is given below
mydf <- as.data.frame(matrix(rnorm(50), ncol=5))
mydf
V1 V2 V3 V4 V5
1 -1.3088022 -0.22088032 -1.8739405 1.65276442 1.21762297
2 1.1123253 -0.76042101 -0.1608188 0.39945804 -0.58674209
3 -0.9366654 0.92893610 -0.6905299 -0.37374892 -1.70539909
4 0.4619175 -0.28929198 1.0280021 -0.87998207 -0.34493824
5 -0.3741670 -0.61782368 -1.0435906 0.52166082 -0.29308408
6 -1.2283031 -0.37065379 0.8652538 0.05088202 -1.80997313
7 -1.1137726 -0.97878307 0.5045051 0.85442196 0.02932812
8 0.3373866 -0.46614754 -0.4642278 -0.38438002 -1.47251777
9 0.3245720 -0.06047061 -0.3273080 0.49145133 -0.86507348
10 1.6459180 -1.31076464 1.5627246 0.49841764 0.73895626
the following changes the format of the data frame
mydf <- format(mydf, digits=2)
mydf
V1 V2 V3 V4 V5
1 -1.31 -0.22 -1.87 1.653 1.218
2 1.11 -0.76 -0.16 0.399 -0.587
3 -0.94 0.93 -0.69 -0.374 -1.705
4 0.46 -0.29 1.03 -0.880 -0.345
5 -0.37 -0.62 -1.04 0.522 -0.293
6 -1.23 -0.37 0.87 0.051 -1.810
7 -1.11 -0.98 0.50 0.854 0.029
8 0.34 -0.47 -0.46 -0.384 -1.473
9 0.32 -0.06 -0.33 0.491 -0.865
10 1.65 -1.31 1.56 0.498 0.739
but this formatting is not applied when I add a new column to the data frame, see below
mydf$new <- rnorm(10)
mydf
V1 V2 V3 V4 V5 new
1 -1.31 -0.22 -1.87 1.653 1.218 0.30525117
2 1.11 -0.76 -0.16 0.399 -0.587 -1.83038790
3 -0.94 0.93 -0.69 -0.374 -1.705 0.34830499
4 0.46 -0.29 1.03 -0.880 -0.345 -0.66017888
5 -0.37 -0.62 -1.04 0.522 -0.293 0.03103741
6 -1.23 -0.37 0.87 0.051 -1.810 1.32809006
7 -1.11 -0.98 0.50 0.854 0.029 0.85428977
8 0.34 -0.47 -0.46 -0.384 -1.473 -0.51917266
9 0.32 -0.06 -0.33 0.491 -0.865 -0.37057104
10 1.65 -1.31 1.56 0.498 0.739 -1.32447706
I know I can adjust the digits using print {base}, but that also does not change the underlying format of the data frame. Any suggestion? Thanks in advance.
Given the data
Step A B C D E F G I J
1 1 0.158 0.011 0.099 6.504 5.914 0.000 0.100 0.330 0.000
2 2 0.345 0.016 0.102 6.050 5.285 0.000 0.102 0.316 0.001
3 1 0.324 0.015 0.100 7.146 6.426 0.000 0.101 0.293 0.000
4 2 0.264 0.015 0.099 5.864 5.202 0.000 0.101 0.296 0.000
5 1 0.346 0.022 0.101 5.889 5.027 0.000 0.101 0.411 0.000
6 2 0.397 0.022 0.130 6.061 5.311 0.000 0.131 0.220 0.000
7 1 0.337 0.015 0.048 7.417 6.839 0.000 0.110 0.129 0.000
8 2 0.362 0.016 0.143 5.726 4.951 0.001 0.144 0.268 0.000
9 1 0.178 0.011 0.099 5.831 5.290 0.000 0.100 0.261 0.000
d < - read.table('sample.txt', header=T) gives me a data frame, and boxplot(d$A ~ d$Step) yields a reasonable graph, but I cannot seem to get all plots on the same graph. Something like boxplot(d ~ d$Step) is what I expected to work, but I get the following error:
Error in model.frame.default(formula = d ~ d$Step) :
invalid type (list) for variable 'd'
I've tried making Step a factor d$Step <- as.factor(d$Step) but that seems to have no effect.
An alternative is to plot these in base R each on their own scale, like this
par(mfrow=c(3,3))
for(i in 2:10) {
boxplot(d[,i] ~ d$Step, main=names(d)[i]) }
We can do this with tidyverse
library(tidyverse)
gather(d, Var, Val, -Step) %>%
mutate(Step=factor(Step)) %>%
ggplot(., aes(x=Var, y = Val, fill=Step)) +
geom_boxplot() +
scale_fill_manual(values = c("red", "blue"))
I need your help!
I am trying to pull out rows of the second matrix based on IDs from the first matrix. To check that my function (which is not provided here) works correctly, I run the following code (CritMat is the second matrix and parms is the first):
results <- matrix(0, nrow = 15, ncol = 8)
colnames(results) <- c("alpha", "beta", "omega", "T=64", "T=128", "T=256", "T=512", "T=1024")
for (r in 1:15) {
results [r,] <- CritMat[CritMat[, 1] == parms[r, 2] & CritMat[, 2] ==
parms[r, 1] & CritMat[, 3] == parms[r, 3] , ]
print(results[r,])
}
The loop works for the first 4 iterations followed by the following error message for the fifth:
*Error in results[r, ] <- CritMat[CritMat[, 1] == parms[r, 2] & CritMat[, :
replacement has length zero*
Any idea why this happens and solution.
Many thanks
AA
****parms matrix****
beta alpha omega
1 0.005 0.005 0.990
2 0.240 0.005 0.755
3 0.490 0.005 0.505
4 0.740 0.005 0.255
5 0.990 0.005 0.005
6 0.005 0.250 0.745
7 0.240 0.250 0.510
8 0.490 0.250 0.260
9 0.740 0.250 0.010
10 0.005 0.500 0.495
11 0.240 0.500 0.260
12 0.490 0.500 0.010
13 0.005 0.750 0.245
14 0.240 0.750 0.010
15 0.005 0.990 0.005
****CritMat matrix****
alpha beta omega T.64 T.128 T.256 T.512 T.1024
1 0.005 0.005 0.990 -2.956420 -2.919654 -2.921704 -2.886429 -2.879443
2 0.005 0.240 0.755 -2.959242 -2.917744 -2.923356 -2.885018 -2.881905
3 0.005 0.490 0.505 -2.959395 -2.915798 -2.927405 -2.886637 -2.885186
4 0.005 0.740 0.255 -2.957763 -2.912088 -2.934518 -2.890182 -2.889484
5 0.005 0.990 0.005 -2.937999 -2.857668 -2.864637 -2.819950 -2.820588
6 0.250 0.005 0.745 -2.987160 -2.986864 -2.897846 -2.865875 -2.911572
7 0.250 0.240 0.510 -3.034868 -2.979375 -2.924888 -2.875446 -2.898752
8 0.250 0.490 0.260 -3.052279 -2.995942 -2.969414 -2.926178 -2.918958
9 0.250 0.740 0.010 -3.197169 -3.263336 -3.258011 -3.202253 -3.248068
10 0.500 0.005 0.495 -3.031267 -3.038585 -2.936348 -2.921126 -2.908868
11 0.500 0.240 0.260 -3.142031 -3.086536 -3.026555 -3.079825 -2.871080
12 0.500 0.490 0.010 -3.383052 -3.410789 -3.431221 -3.367462 -3.332024
13 0.750 0.005 0.245 -3.209441 -3.170385 -3.112472 -3.141569 -2.925559
14 0.750 0.240 0.010 -3.452131 -3.517234 -3.428402 -3.477691 -3.178128
15 0.990 0.005 0.005 -3.427804 -3.491805 -3.298037 -3.290127 -3.087541
I've got data in the following format.
P10_neg._qn P11_neg._qn P12_neg._qn P14_neg._qn P17_neg._qn P24_neg._qn P25_neg._qn
1 -0.025 -0.037 -0.032 -0.061 -0.176 0.033 -0.011
2 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
3 0.033 -0.127 0.042 0.014 0.097 0.105 0.048
4 0.033 -0.127 0.042 0.014 0.097 0.105 0.048
5 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
6 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
What is the best way by which I can check, for every row, how many entries are greater than 0.1, for instance and return a vector of counts?
You can use the rowSum function for this task. Assuming that dat is you matrix then :
rowSum(dat > 0.1)
Using the sample data provided we have :
dat <- read.table(text = ' P10_neg._qn P11_neg._qn P12_neg._qn P14_neg._qn P17_neg._qn P24_neg._qn P25_neg._qn
1 -0.025 -0.037 -0.032 -0.061 -0.176 0.033 -0.011
2 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
3 0.033 -0.127 0.042 0.014 0.097 0.105 0.048
4 0.033 -0.127 0.042 0.014 0.097 0.105 0.048
5 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087
6 -0.029 -0.125 0.003 -0.098 0.117 0.039 0.087',
row.names = 1, header = TRUE)
rowSums(dat > 0.1)
## 1 2 3 4 5 6
## 0 1 1 1 1 1
apply(dat, 1, function(x) sum(x>.1))
# [1] 0 1 1 1 1 1
here an Rcpp version:
// [[Rcpp::export]]
IntegerVector countGreaterThan2(NumericMatrix M,double val) {
IntegerVector res;
for (int i=0; i<M.nrow(); i++) {
NumericVector row = M( i, _);
double num = std::count_if(row.begin(), row.end(),
[&val](const double& x) -> bool {return x>val;});
res.push_back(num);
}
return res;
}
But rowSum is unbeatable:
system.time(rowSums(dfx>0.2))
user system elapsed
0.01 0.00 0.02
> system.time(countGreaterThan2(dfx,0.2))
user system elapsed
0.06 0.00 0.06