How to count these transitions - in R - r
Given a table of values, where A = state of system, B = length of state, and C = cumulative length of states:
A B C
1 1.16 1.16
0 0.51 1.67
1 1.16 2.84
0 0.26 3.10
1 0.59 3.69
0 0.39 4.08
1 0.78 4.85
0 0.90 5.75
1 0.78 6.53
0 0.26 6.79
1 0.12 6.91
0 0.51 7.42
1 0.26 7.69
0 0.51 8.20
1 0.39 8.59
0 0.51 9.10
1 1.16 10.26
0 1.10 11.36
1 0.59 11.95
0 0.51 12.46
How would I use R to calculate the number of transitions (where A gives the state) per constant interval length - where the intervals are consecutive and could be any arbitrary number (I chose a value of 2 in my image example)? For example, using the table values or the image included we count 2 transitions from 0-2, 3 transitions from greater than 2-4, 3 transitions from >4-6, etc.
This is straightforward in R. All you need is column C and ?cut. Consider:
d <- read.table(text="A B C
1 1.16 1.16
0 0.51 1.67
1 1.16 2.84
0 0.26 3.10
1 0.59 3.69
0 0.39 4.08
1 0.78 4.85
0 0.90 5.75
1 0.78 6.53
0 0.26 6.79
1 0.12 6.91
0 0.51 7.42
1 0.26 7.69
0 0.51 8.20
1 0.39 8.59
0 0.51 9.10
1 1.16 10.26
0 1.10 11.36
1 0.59 11.95
0 0.51 12.46", header=TRUE)
fi <- cut(d$C, breaks=seq(from=0, to=14, by=2))
table(fi)
# fi
# (0,2] (2,4] (4,6] (6,8] (8,10] (10,12] (12,14]
# 2 3 3 5 3 3 1
Related
What could be the R code for Making an index?
Suppose we run a regression with dummy variables and we obtain results like Yt=a+b1X1+b2X2+b3X3...+bnXn+C1Dum1+C2Dum2+C3Dum3+...+CnDumN I want to create an index I, such that, I=W1*Dum1+W2*Dum2+W3*Dum3+...+Wn*DumN where Wi's are the weights as the regression coefficients of dummies.
Supposed you have a data frame like this. head(dat) # X1 X2 Y D1 D2 D3 W1 W2 W3 # 1 1.37 -0.31 0.21 0 1 0 -1.04 1.39 -0.03 # 2 -0.56 -1.78 -0.36 0 1 0 -0.09 -0.48 0.11 # 3 0.36 -0.17 0.76 0 1 0 0.62 0.65 -0.49 # 4 0.63 1.21 -0.73 0 1 0 -0.95 1.39 -0.50 # 5 0.40 1.90 -1.37 0 1 0 -0.54 -1.11 -1.66 # 6 -0.11 -0.43 0.43 0 1 0 0.58 -0.86 -0.38 You may use which.max to subset. dat <- transform(dat, weight=apply(dat, 1, function(x) x[weights][which.max(x[dummies])])) dat # X1 X2 Y D1 D2 D3 W1 W2 W3 weight # 1 1.37 -0.31 0.21 0 1 0 -1.04 1.39 -0.03 1.39 # 2 -0.56 -1.78 -0.36 0 1 0 -0.09 -0.48 0.11 -0.48 # 3 0.36 -0.17 0.76 0 1 0 0.62 0.65 -0.49 0.65 # 4 0.63 1.21 -0.73 0 1 0 -0.95 1.39 -0.50 1.39 # 5 0.40 1.90 -1.37 0 1 0 -0.54 -1.11 -1.66 -1.11 # 6 -0.11 -0.43 0.43 0 1 0 0.58 -0.86 -0.38 -0.86 # 7 1.51 -0.26 -0.81 0 0 1 0.77 -1.13 -0.51 -0.51 # 8 -0.09 -1.76 1.44 0 0 1 0.46 -1.46 2.70 2.70 # 9 2.02 0.46 -0.43 1 0 0 -0.89 0.08 -1.36 -0.89 # 10 -0.06 -0.64 0.66 0 1 0 -1.10 0.65 0.14 0.65 # 11 1.30 0.46 0.32 0 0 1 1.51 1.20 -1.49 -1.49 # 12 2.29 0.70 -0.78 0 1 0 0.26 1.04 -1.47 1.04 # 13 -1.39 1.04 1.58 0 1 0 0.09 -1.00 0.12 -1.00 # 14 -0.28 -0.61 0.64 0 0 1 -0.12 1.85 -1.00 -1.00 # 15 -0.13 0.50 0.09 0 0 1 -1.19 -0.67 0.00 0.00 # 16 0.64 -1.72 0.28 0 1 0 0.61 0.11 -0.43 0.11 # 17 -0.28 -0.78 0.68 0 0 1 -0.22 -0.42 -0.61 -0.61 # 18 -2.66 -0.85 0.09 1 0 0 -0.18 -0.12 -2.02 -0.18 # 19 -2.44 -2.41 -2.99 0 0 1 0.93 0.19 -1.22 -1.22 # 20 1.32 0.04 0.28 0 1 0 0.82 0.12 0.18 0.12 Data: I show here how I created the example data, which probably answers additional questions. set.seed(42) dat <- data.frame(matrix(round(rnorm(60), 2), 20, 3)) dat$X4 <- rbinom(20, 2, .5) names(dat)[3] <- "Y" dat <- cbind(dat[-4], setNames(data.frame(model.matrix(X1 ~ 0 + factor(X4), dat)), paste0("D", 1:3))) dat <- cbind(dat, setNames(data.frame(matrix(round(rnorm(60), 2), 20, 3)), paste0("W", 1:3)))
Cumulative sum based on factor on R
I have the following dataset, and I need to acumulate the value and sum, if the factor is 0, and then put the cummulated sum when I found the factor != 0. I've tried the loop bellow, but it didn't worked at all. for(i in dataset$Variable.1) { ifelse(dataset$Factor == 0, dataset$teste <- dataset$Variable.1 + i, dataset$teste <- dataset$Variable.1) i<- dataset$Variable.1 print(i) } Any ideas? Bellow an example of the dataset. I wish to get the "Result" Column. On the real one, I also have a negative factor (-1). Date Factor Variable.1 Result 1 03/02/2018 0 0.75 0.75 2 04/02/2018 0 0.75 1.50 3 05/02/2018 1 0.96 2.46 4 06/02/2018 1 0.76 0.76 5 07/02/2018 0 1.35 1.35 6 08/02/2018 1 0.70 2.05 7 09/02/2018 1 2.02 2.02 8 10/02/2018 0 0.00 0.00 9 11/02/2018 0 0.00 0.00 10 12/02/2018 0 0.20 0.20 11 13/02/2018 0 0.13 0.33 12 14/02/2018 0 1.64 1.97 13 15/02/2018 0 0.03 2.00 14 16/02/2018 1 0.51 2.51 15 17/02/2018 1 0.00 0.00 16 18/02/2018 0 0.00 0.00 17 19/02/2018 0 0.83 0.83 18 20/02/2018 1 0.42 1.25 19 21/02/2018 1 0.17 0.17 20 22/02/2018 1 0.97 0.97 21 23/02/2018 0 0.92 0.92 22 24/02/2018 0 0.00 0.92 23 25/02/2018 0 0.00 0.92 24 26/02/2018 1 0.19 1.11 25 27/02/2018 1 0.87 0.87 26 28/02/2018 1 0.85 0.85 27 01/03/2018 1 1.95 1.95 28 02/03/2018 1 0.54 0.54 29 03/03/2018 1 0.00 0.00 30 04/03/2018 0 0.00 0.00 31 05/03/2018 0 1.17 1.17 32 06/03/2018 1 0.25 1.42 33 07/03/2018 1 1.45 1.45 Thanks In advance.
If you want to stick with the for-loop, you can try this code : DF$Result <- NA prev <- 0 for(i in seq_len(nrow(DF))){ DF$Result[i] <- DF$Variable.1[i] + prev if(DF$Factor[i] == 1) prev <- 0 else prev <- DF$Result[i] }
Iteratively, try something like: a=as.data.frame(cbind(Factor=c(0,0,1,1,0,1,1, rep(0,3),1),Variable.1=c(0.75,0.75,0.96,0.71,1.35,0.7, 0.75,0.96,0.71,1.35,0.7))) Result=0 aux=NULL for (i in 1:nrow(a)){ if (a$Factor[i]==0){ Result=Result+a$Variable.1[i] aux=c(aux,Result) } else{ Result=Result+a$Variable.1[i] aux=c(aux,Result) Result=0 } } a$Results=aux a Factor Variable.1 Results 1 0 0.75 0.75 2 0 0.75 1.50 3 1 0.96 2.46 4 1 0.71 0.71 5 0 1.35 1.35 6 1 0.70 2.05 7 1 0.75 0.75 8 0 0.96 0.96 9 0 0.71 1.67 10 0 1.35 3.02 11 1 0.70 3.72
A possibility using tidyverse and data.table: df %>% mutate(temp = ifelse(Factor == 1 & lag(Factor) == 1, NA, 1), #Marking the rows after the first 1 in "Factor" as NA temp = ifelse(!is.na(temp), rleid(temp), NA)) %>% #Run length along non-NA values group_by(temp) %>% #Grouping by run length mutate(Result = ifelse(!is.na(temp), cumsum(Variable.1), Variable.1)) %>% #Cumulative sum of desired rows ungroup() %>% select(-temp) #Removing the redundant variable Date Factor Variable.1 Result <chr> <int> <dbl> <dbl> 1 03/02/2018 0 0.750 0.750 2 04/02/2018 0 0.750 1.50 3 05/02/2018 1 0.960 2.46 4 06/02/2018 1 0.760 0.760 5 07/02/2018 0 1.35 1.35 6 08/02/2018 1 0.700 2.05 7 09/02/2018 1 2.02 2.02 8 10/02/2018 0 0. 0. 9 11/02/2018 0 0. 0. 10 12/02/2018 0 0.200 0.200
Why is that an EC2 strong comupter gives close results to a 16gb ram laptop
How can it be that m4.4xlarge EC2 computer with 64GB RAM and 16 logical cores has almost the same results as my 16GB RAM laptop that has 4 logical cores? I used benchmark package for the test (attached below). Is there any way to better configure the m4.4xlarge EC2 computer? This is the system info for m4.4xlarge EC2 computer Sys.info() sysname release version "Windows" "Server >= 2012 x64" "build 9200" nodename machine login "EC2AMAZ-4R7L3U6" "x86-64" "Administrator" user effective_user "Administrator" "Administrator" > library("parallel", lib.loc="C:/Program Files/R/R-3.4.2/library") > detectCores(logical = FALSE) [1] 8 > detectCores(logical = TRUE) [1] 16 This is my laptop's system info: Sys.info() sysname release version "Windows" "7 x64" "build 7601, Service Pack 1" nodename machine login "USER-PC" "x86-64" "user" user effective_user "user" "user" > detectCores(logical = TRUE) [1] 4 > detectCores(logical = FALSE) [1] 2 The test results are: For the m4.4xlarge EC2 computer: > benchmark_std() user system elapsed test test_group cores 1 1.02 0.02 1.03 fib prog 0 2 0.75 0.00 0.75 fib prog 0 3 0.77 0.00 0.77 fib prog 0 4 1.34 0.07 1.41 gcd prog 0 5 1.20 0.06 1.27 gcd prog 0 6 1.05 0.08 1.12 gcd prog 0 7 0.19 0.02 0.21 hilbert prog 0 8 0.34 0.02 0.36 hilbert prog 0 9 0.35 0.02 0.36 hilbert prog 0 10 16.37 0.03 16.40 toeplitz prog 0 11 16.46 0.00 16.46 toeplitz prog 0 12 16.37 0.00 16.37 toeplitz prog 0 13 1.41 0.04 1.44 escoufier prog 0 14 1.25 0.00 1.25 escoufier prog 0 15 1.27 0.00 1.27 escoufier prog 0 16 0.91 0.01 0.92 manip matrix_cal 0 17 0.92 0.02 0.94 manip matrix_cal 0 18 0.78 0.00 0.78 manip matrix_cal 0 19 1.01 0.02 1.03 power matrix_cal 0 20 1.03 0.00 1.03 power matrix_cal 0 21 1.03 0.02 1.04 power matrix_cal 0 22 0.82 0.00 0.83 sort matrix_cal 0 23 0.81 0.02 0.83 sort matrix_cal 0 24 0.80 0.03 0.83 sort matrix_cal 0 25 8.83 0.00 8.83 cross_product matrix_cal 0 26 8.83 0.01 8.85 cross_product matrix_cal 0 27 8.85 0.02 8.86 cross_product matrix_cal 0 28 5.92 0.01 5.93 lm matrix_cal 0 29 5.92 0.00 5.92 lm matrix_cal 0 30 5.90 0.02 5.93 lm matrix_cal 0 31 5.17 0.02 5.18 cholesky matrix_fun 0 32 5.03 0.02 5.05 cholesky matrix_fun 0 33 5.03 0.01 5.04 cholesky matrix_fun 0 34 2.89 0.00 2.89 determinant matrix_fun 0 35 2.84 0.00 2.84 determinant matrix_fun 0 36 2.87 0.00 2.87 determinant matrix_fun 0 37 0.73 0.00 0.73 eigen matrix_fun 0 38 0.74 0.00 0.74 eigen matrix_fun 0 39 0.73 0.00 0.74 eigen matrix_fun 0 40 0.28 0.00 0.28 fft matrix_fun 0 41 0.28 0.00 0.29 fft matrix_fun 0 42 0.28 0.00 0.29 fft matrix_fun 0 43 2.14 0.02 2.17 inverse matrix_fun 0 44 2.17 0.00 2.17 inverse matrix_fun 0 45 2.16 0.00 2.16 inverse matrix_fun 0 For the 16GB laptop: user system elapsed test test_group cores 1 0.67 0.00 0.67 fib prog 0 2 0.69 0.00 0.68 fib prog 0 3 0.67 0.00 0.67 fib prog 0 4 1.20 0.05 1.26 gcd prog 0 5 1.18 0.02 1.20 gcd prog 0 6 1.22 0.01 1.23 gcd prog 0 7 0.52 0.02 0.53 hilbert prog 0 8 0.48 0.05 0.53 hilbert prog 0 9 0.25 0.01 0.27 hilbert prog 0 10 16.08 0.00 16.18 toeplitz prog 0 11 18.87 0.00 18.90 toeplitz prog 0 12 16.81 0.00 16.86 toeplitz prog 0 13 1.17 0.00 1.17 escoufier prog 0 14 1.21 0.00 1.22 escoufier prog 0 15 1.19 0.00 1.21 escoufier prog 0 16 0.95 0.00 0.95 manip matrix_cal 0 17 1.14 0.05 1.19 manip matrix_cal 0 18 0.67 0.03 0.72 manip matrix_cal 0 19 0.86 0.00 0.85 power matrix_cal 0 20 0.89 0.00 0.89 power matrix_cal 0 21 0.87 0.00 0.88 power matrix_cal 0 22 0.75 0.00 0.75 sort matrix_cal 0 23 0.71 0.03 0.75 sort matrix_cal 0 24 0.71 0.00 0.73 sort matrix_cal 0 25 7.99 0.00 8.01 cross_product matrix_cal 0 26 7.96 0.07 8.03 cross_product matrix_cal 0 27 7.96 0.00 7.97 cross_product matrix_cal 0 28 5.38 0.00 5.41 lm matrix_cal 0 29 5.50 0.00 5.49 lm matrix_cal 0 30 5.50 0.00 5.51 lm matrix_cal 0 31 4.55 0.03 4.63 cholesky matrix_fun 0 32 4.68 0.00 4.73 cholesky matrix_fun 0 33 4.54 0.02 4.60 cholesky matrix_fun 0 34 3.06 0.00 3.11 determinant matrix_fun 0 35 3.41 0.00 3.42 determinant matrix_fun 0 36 3.44 0.00 3.46 determinant matrix_fun 0 37 0.98 0.00 0.99 eigen matrix_fun 0 38 0.79 0.00 0.79 eigen matrix_fun 0 39 1.03 0.00 1.06 eigen matrix_fun 0 40 0.40 0.00 0.42 fft matrix_fun 0 41 0.39 0.00 0.39 fft matrix_fun 0 42 0.39 0.00 0.39 fft matrix_fun 0 43 2.75 0.00 2.74 inverse matrix_fun 0 44 2.70 0.00 2.79 inverse matrix_fun 0 45 2.70 0.00 2.69 inverse matrix_fun 0
Your benchmarks seem to be computing mathematical functions that aren't easy to parallelize, so it's likely that they aren't parallelized. Which means that only one CPU core will do the work. Thus, the number of cores in your computer (virtual or not) won't affect the performance, only the core's speed will. Your laptop seems a little bit slower than the EC2 computer, which seems expected given the expected speed difference of a a single core. Then, since one can't just "speed up" the EC2 computers (they're already running as fast as they can), you simply can't get better results with these benchmarks. Try another benchmark that does some parallel processing, and you'll see the huge benefits of an EC2 instance VS. your laptop.
Error $ operator is invalid for atomic vectors or "No Bins"
I have a problem when use smbinning package. I have dataset consists of ratio and Good_Bad: ratio: 0.40 0.41 0.54 0.61 0.64 0.70 0.74 0.74 0.78 0.79 0.80 0.81 0.82 0.83 0.87 0.89 0.89 1.03 1.03 1.06 1.07 1.08 1.08 1.09 1.09 1.10 1.12 1.12 1.13 1.15 1.18 1.20 1.23 1.24 1.24 1.33 1.33 1.36 1.38 1.38 1.39 1.40 1.42 1.44 1.47 1.48 1.48 1.53 1.55 1.55 1.60 1.62 1.65 1.67 1.72 1.73 1.74 1.75 1.85 1.86 1.89 1.90 2.02 2.04 2.07 2.09 2.18 2.20 2.22 2.24 2.39 2.41 2.43 2.46 2.76 2.85 2.91 3.05 3.75 4.21 5.18 5.33 8.70 Good_Bad: 0 0 0 1 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 1 0 1 1 0 0 1 0 0 1 0 0 0 0 1 1 1 0 1 0 1 0 0 1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 1 1 0 1 0 1 1 1 0 1 1 1 0 1 0 1 0 0 1 1 1 1 1 1 1 0 Code: binning <- smbinning(df=dataset, y="Good_Bad", x="ratio", p=0.05) binning$ivtable Error in binning$ivtable : $ operator is invalid for atomic vectors binning [1] "No Bins" Why error and the result is "No Bins"?
Have you checked that the column "ratio" is stored as a numeric? If it's stored as a factor or character you'd need to use smbinning.factor()
get the paired sample in R language [closed]
Closed. This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 9 years ago. Improve this question X<-scan() 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 Z<-scan() -0.05 0.11 -0.01 1.08 0.68 -1.79 -0.12 -0.06 0.17 -1.35 1.55 0.60 -1.42 -1.21 0.97 0.23 0.20 0.89 0.28 0.56 1.02 -0.32 0.20 -1.35 0.53 -0.52 -0.07 -1.07 0.10 0.53 0.97 0.32 -0.07 0.98 -1.23 0.72 -0.09 0.31 1.25 0.60 1.16 -0.98 1.63 0.72 0.24 -0.02 -1.13 0.56 0.78 1.75 -0.01 -0.44 0.47 -0.21 2.06 2.19 -0.94 -0.36 1.35 -1.35 1.50 0.13 -0.20 -0.57 -0.14 -1.34 -1.17 2.04 0.21 1.47 -1.20 -0.60 0.15 -0.64 -0.71 0.24 -0.86 -1.39 -0.63 -1.25 0.40 -0.76 0.73 -0.15 0.09 0.35 -0.19 0.29 0.56 0.82 -0.28 0.63 1.35 -0.04 1.99 1.12 -1.91 0.26 -1.18 -0.10 In the vector X, 0 is control group and 1 is case group. I want to match this cases and controls based on Z vector.Actually I want to match elements of X based on Z ang get the samples from matched data. what should I do?
The other answers seem to think that you're looking for subsetting, but I'm assuming (based on your use of the language "case" and "controls") that you're talking about matching in a statistical sense. If so, it sounds like you want something like the functionality provided by the Matching package, like the following: library(Matching) out <- Match(Tr=X,X=Z) out$mdata # list of `Y` outcome vector (if applicable), # `Tr` treatment vector, and # `X` matrix of covariates for the matched sample If you also have an outcome measure, you can specify that in Match and it will give you treatment effect estimates. There are also other packages to do matching, like MatchIt, cem, and nonrandom (the last of which has apparently been removed from CRAN), depending on what particular matching procedure you're going for.
I suppose you are looking for Z[as.logical(X)] # case and Z[!X] # control
I suppose your question is about subsetting, here is some examples: # Data X<-c(1,1,1,0,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,1,1,1,0,1,1,1,1,0,0,1,1,0,0,1,1,1,1,1,1,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1) Z<-c(-0.05,0.11,-0.01,1.08,0.68,-1.79,-0.12,-0.06,0.17,-1.35,1.55,0.60,-1.42,-1.21,0.97,0.23,0.20,0.89,0.28,0.56,1.02,-0.32,0.20,-1.35,0.53,-0.52,-0.07,-1.07,0.10,0.53,0.97,0.32,-0.07,0.98,-1.23,0.72,-0.09,0.31,1.25,0.60,1.16,-0.98,1.63,0.72,0.24,-0.02,-1.13,0.56,0.78,1.75,-0.01,-0.44,0.47,-0.21,2.06,2.19,-0.94,-0.36,1.35,-1.35,1.50,0.13,-0.20,-0.57,-0.14,-1.34,-1.17,2.04,0.21,1.47,-1.20,-0.60,0.15,-0.64,-0.71,0.24,-0.86,-1.39,-0.63,-1.25,0.40,-0.76,0.73,-0.15,0.09,0.35,-0.19,0.29,0.56,0.82,-0.28,0.63,1.35,-0.04,1.99,1.12,-1.91,0.26,-1.18,-0.10) myMatrix <- cbind(X,Z) # Subsetting myMatrixControls <- myMatrix[ myMatrix[,1]==0,] myMatrixCases <- myMatrix[ myMatrix[,1]==1,] # Example: get sum per group sumZ_Contolrs <- sum(myMatrix[ myMatrix[,1]==0, 2]) sumZ_Cases <- sum(myMatrix[ myMatrix[,1]==1, 2])