I am currently trying to program a function that shows me possible ways to finish a certain number of points in 1 to 3 dart throws. Only the range from 170 to 2 points is necessary.
Example: I have 120 points left. I want a function that gives out possible ways to finish 120 points. For example by throwing 40 - 40 - 40; 60 - 60 or 57 - 57 - 6 and so on...
I am new to programming in R and to programming in general and have absolutely no idea how and where to start. Does anyone have a recommendation for me? Is there a specific function that can be applied in this case?
Thank you very much in advance!
here is another approach. One thing you would have to fix (if needed), this solution makes separate 'solutions` for, let's say, a single 2, and a double 1.
so, 4 can be ended with
single 1, single 1, double 2
single 2, double 2
double 2, double 2
and this gets worse on the thee dart finishes...
singles <- c(1:20, 25)
doubles <- 2 * singles
triples <- 3*1:20
#all possible eidings
one.dart <- tidyr::crossing( first = doubles )
two.dart <- tidyr::crossing( first = unique( c(singles, doubles, triples ) ),
second = doubles )
three.dart <- tidyr::crossing( first = unique( c( singles, doubles, triples ) ),
second = unique( c( singles, doubles, triples ) ),
third = doubles )
#bind together
DT <- rbindlist( list( one.dart, two.dart, three.dart), use.names = TRUE, fill = TRUE )
#calculate finish total and number of darts used
DT[, finish := rowSums(.SD, na.rm = TRUE) ]
DT[, darts := rowSums( !is.na(.SD)), .SDcols = 1:3 ]
calculate_finish <- function( x ) {
DT[ finish == x, 1:3]
}
calculate_finish( 120 )
# first second third
# 1: 10 60 50
# 2: 13 57 50
# 3: 16 54 50
# 4: 19 51 50
# 5: 20 50 50
# ---
# 130: 60 40 20
# 131: 60 42 18
# 132: 60 48 12
# 133: 60 50 10
# 134: 60 54 6
the question you have is rather general - more like outsouring your programming than finding help for specific implementation parts of the code. Anyhow here goes a simplified solution that is far from perfect programming wise but does the job. I hope it helps you to understand progamming and functions plus a bit of R, given that you have a specific problem.
# we need this library to be able to use the %>% operator below
library(dplyr)
# vector of all the possible field numbers (i have no idea but you can look them up and fill the field)
dboard <- c(1, 2, 3, 4, 5)
# expand this to a grid of all possible combinations
all_combinations <- expand.grid(dboard, dboard,dboard ,stringsAsFactors = FALSE)
# your function with two inputs
get_throw_combinations <- function(remaining_points, throws){
# call the outside of the functions defined possible combinations
res <- all_combinations %>%
# select from first to number of trows columns
dplyr::select(1:throws) %>%
# reduce reduncancy (happens when throws < 3)
dplyr::distinct() %>%
# filter those where the rowsum is equal to the target
dplyr::filter(rowSums(.) == remaining_points)
# return the result
return (res)
}
# run your function for 5 points with two throws
get_throw_combinations(5, 2)
# results R will display
Var1 Var2
1 4 1
2 3 2
3 2 3
4 1 4
Here is a brute-force approach using expand.grid + subset
v <- 2:170
res <- lapply(
1:3,
function(k) subset(u <- expand.grid(rep(list(v), k)), rowSums(u) == 120)
)
Related
I have an imported dataframe Measurements that contains many observations from an experiment.
Measurements <- data.frame(X = 1:4,
Data = c(90, 85, 100, 105))
X Data
1 90
2 85
3 100
4 105
I want to add another column Condition that specifies the treatment group for each datapoint. I know which obervation ranges are from which condition (e.g. observations 1:2 are from the control and observations 3:4 are from the experimental group).
I have devised two solutions already that give the desired output but neither are ideal. First:
Measurements["Condition"] <- c(rep("Cont", 2), rep("Exp", 2))
X Data Condition
1 90 Cont
2 85 Cont
3 100 Exp
4 105 Exp
The benefit of this is it is one line of code/one command. But this is not ideal since I need to do math outside separately (e.g. 3:4 = 2 obs, etc) which can be tricky/unclear/indirect with larger datasets and more conditions (e.g. 47:83 = ? obs, etc) and would be liable to perpetuating errors since a small error in length for an early assignment would also shift the assignment of later groups (e.g. if rep of Cont is mistakenly 1, then Exp gets mistakenly assigned to 2:3 too).
I also thought of assigning like this, which gives the desired output too:
Measurements[1:2, "Condition"] <- "Cont"
Measurements[3:4, "Condition"] <- "Exp"
X Data Condition
1 90 Cont
2 85 Cont
3 100 Exp
4 105 Exp
This makes it more clear/simple/direct which rows will receive which assignment, but this requires separate assignments and repetition. I feel like there should be a way to "vectorize" this assignment, which is the solution I'm looking for.
I'm having trouble finding complex indexing rules from online. Here is my first intuitive guess of how to achieve this:
Measurements[c(1:2, 3:4), "Condition"] <- list("Cont", "Exp")
X Data Condition
1 90 Cont
2 85 Cont
3 100 Cont
4 105 Cont
But this doesn't work. It seems to combine 1:2 and 3:4 into a single equivalent range (1:4) and assigns only the first condition to this range, which suggests I also need to specify the column again. When I try to specify the column again:
Measurements[c(1:2, 3:4), c("Condition", "Condition")] <- list("Cont", "Exp")
X Data Condition Condition.1
1 90 Cont Exp
2 85 Cont Exp
3 100 Cont Exp
4 105 Cont Exp
For some reason this creates a second new column (??), and it again seems to combine 1:2 and 3:4 into essentially 1:4. So I think I need to index the two row ranges in a way that keeps them separate and only specify the column once, but I'm stuck on how to do this. I assume the solution is simple but I can't seem to find an example of what I'm trying to do. Maybe to keep them separate I do have to assign them separately, but I'm hoping there is a way.
Can anyone help? Thank you a ton in advance from an R noobie!
If you already have a list of observations which belong to each condition you could use dplyr::case_when to do a conditional mutate. Depending on how you have this information stored you could use something like the following:
library(dplyr)
Measurements <- data.frame(X = 1:4,
Data = c(90, 85, 100, 105))
# set which observations belong to each condition
Cont <- 1:2
Exp <- 3:4
Measurements %>%
mutate(Condition = case_when(
X %in% Cont ~ "Cont",
X %in% Exp ~ "Exp"
))
# X Data Condition
# 1 90 Cont
# 2 85 Cont
# 3 100 Exp
# 4 105 Exp
Note that this does not require the observations to be in consecutive rows.
I normally see this done with a merge operation. The trick is getting your conditions data into a nice shape.
composeConditions <- function(...) {
conditions <- list(...)
data.frame(
X = unname(unlist(conditions)),
condition = unlist(unname(lapply(
names(conditions),
function(x) rep(x, times = length(conditions[x][[1]]))
)))
)
}
conditions <- composeConditions(Cont = 1:2, Exp = 3:4)
> conditions
X condition
1 1 Cont
2 2 Cont
3 3 Exp
4 4 Exp
merge(Measurements, conditions, by = "X")
X Data condition
1 1 90 Cont
2 2 85 Cont
3 3 100 Exp
4 4 105 Exp
Efficient for larger datasets is to know the data pattern and the data id.
Measurements <- data.frame(X = 1:4, Data = c(90, 85, 100, 105))
dat <- c("Cont","Exp")
pattern <- c(1,1,2,2)
Or draw pattern from data, e.g. conditional from Measurements$Data
pattern <- sapply( Measurements$Data >=100, function(x){ if(x){2}else{1} } )
# [1] 1 1 2 2
Then you can add the data simply by doing:
Measurements$Condition <- dat[pattern]
# X Data Condition
#1 1 90 Cont
#2 2 85 Cont
#3 3 100 Exp
#4 4 105 Exp
I am trying to create a markov transition matrix from sequence of doctor visits for different patients. In my markov model states are the different doctors and connections are visits by patients. A patient can stay with the same provider or transition to another for the next visit. Using that information I need to create a transition matrix.
Here is a part of the data in excel. Data includes more than 30K visits to almost 100 different providers.
Here is the part of the data in excel.
data
How can I use this excel data (or csv) and create a Markov transition matrix as number of visits, such as:
....
The matrix I need will look like this:
enter image description here
How can I transform my data to transition matrix with R?
I am fairly new with R and really need help.
Thank you
Here's an approach that works with your sample data.
I'll use readxl to get the data and data.table to manipulate it.
Reading data:
library(readxl)
library(data.table)
data <- setDT(read_excel("~/Desktop/Book2.xlsx"))[!is.na(PatId)]
#read_excel doesn't have the option to specify integers... silly...
data[ , (names(data)) := lapply(.SD, as.integer)]
Pre-allocate transition matrix:
provs <- data[ , sort(unique(SeenByProv))]
nprov <- length(provs)
markov <- matrix(nrow = nprov, ncol = nprov,
dimnames = list(provs, provs))
Assign row-by-row
for (pr in provs){
markov[as.character(pr), ] <-
data[ , {nxt <- SeenByProv[which(SeenByProv == pr) + 1L]
.(prov = provs, count =
sapply(provs, function(pr2) sum(nxt == pr2, na.rm = TRUE)))}, by = PatId
][, sum(count), by = prov]$V1
}
This can probably be sped up in a few places, but it works.
I wanted to compare my method without using data.table and found it was 45x faster (and probably more straightforward to understand).
First, I time the data.table solution from the accepted answer:
rm(list=ls())
library(readxl)
library(data.table)
############## Using data.table method() ######################
data <- setDT(read_excel("Book2.xlsx"))[!is.na(PatId)]
data[ , (names(data)) := lapply(.SD, as.integer)]
provs <- data[ , sort(unique(SeenByProv))]
nprov <- length(provs)
markov <- matrix(nrow = nprov, ncol = nprov, dimnames = list(provs, provs))
system.time( ## Timing the main loop
for (pr in provs){
markov[as.character(pr), ] <-
data[ , {nxt <- SeenByProv[which(SeenByProv == pr) + 1L]
.(prov = provs, count =
sapply(provs, function(pr2) sum(nxt == pr2, na.rm = TRUE)))}, by = PatId
][, sum(count), by = prov]$V1
}
)
# user system elapsed
# 3.128 0.000 3.135
table(markov)
#markov
# 0 1 2 3 4 5 6 7 8 9 10 11 13 22 140
#3003 308 89 34 14 11 6 4 1 3 4 1 1 1 1
Next using only base R calls:
############## Using all base R calls method() ###################
tm_matrix<-matrix(0, nrow = nprov, ncol = nprov, dimnames = list(provs, provs))
d<-read_excel("Book2.xlsx")
d<-d[!is.na(d$PatId),] # Note: Data is already ordered by PatId, DaysOfStudy
baseR<-function(tm_matrix){
d1<-cbind(d[-nrow(d),-3],d[-1,-3]); # Form the transitions and drop the DaysofStudy
colnames(d1)<-c("SeenByProv","PatId","NextProv","PatId2");
d1<-d1[d1$PatId==d1$PatId2,]; # Drop those transition between different patients
d1$SeenByProv<-as.character(d1$SeenByProv); # transform to strings to use as rownames
d1$NextProv <-as.character(d1$NextProv); # and column names
for (i in 1:nrow(d1)){ # Fill in the transition matrix
tm_matrix[d1$SeenByProv[i],d1$NextProv[i]]<-tm_matrix[d1$SeenByProv[i],d1$NextProv[i]]+1
};
return(tm_matrix)
}
system.time(tm_matrix<-baseR(tm_matrix))
# user system elapsed
# 0.072 0.000 0.072
table(tm_matrix)
#tm_matrix
# 0 1 2 3 4 5 6 7 8 9 10 11 13 22 140
#3003 308 89 34 14 11 6 4 1 3 4 1 1 1 1
all.equal(markov,tm_matrix)
#[1] TRUE
My base-R method is 3.135/0.072 = 43.54 faster
I'm new to R and can't seem to get to grips with how to call a previous value of "self", in this case previous "b" b[-1].
b <- ( ( 1 / 14 ) * MyData$High + (( 13 / 14 )*b[-1]))
Obviously I need a NA somewhere in there for the first calculation, but I just couldn't figure this out on my own.
Adding example of what the sought after result should be (A=MyData$High):
A b
1 5 NA
2 10 0.7142...
3 15 3.0393...
4 20 4.6079...
1) for loop Normally one would just use a simple loop for this:
MyData <- data.frame(A = c(5, 10, 15, 20))
MyData$b <- 0
n <- nrow(MyData)
if (n > 1) for(i in 2:n) MyData$b[i] <- ( MyData$A[i] + 13 * MyData$b[i-1] )/ 14
MyData$b[1] <- NA
giving:
> MyData
A b
1 5 NA
2 10 0.7142857
3 15 1.7346939
4 20 3.0393586
2) Reduce It would also be possible to use Reduce. One first defines a function f that carries out the body of the loop and then we have Reduce invoke it repeatedly like this:
f <- function(b, A) (A + 13 * b) / 14
MyData$b <- Reduce(f, MyData$A[-1], 0, acc = TRUE)
MyData$b[1] <- NA
giving the same result.
This gives the appearance of being vectorized but in fact if you look at the source of Reduce it does a for loop itself.
3) filter Noting that the form of the problem is a recursive filter with coefficient 13/14 operating on A/14 (but with A[1] replaced with 0) we can write the following. Since filter returns a time series we use c(...) to convert it back to an ordinary vector. This approach actually is vectorized as the filter operation is performed in C.
MyData$b <- c(filter(replace(MyData$A, 1, 0)/14, 13/14, method = "recursive"))
MyData$b[1] <- NA
again giving the same result.
Note: All solutions assume that MyData has at least 1 row.
There are a couple of ways you could do this.
The first method is a simple loop
df <- data.frame(A = seq(5, 25, 5))
df$b <- 0
for(i in 2:nrow(df)){
df$b[i] <- (1/14)*df$A[i]+(13/14)*df$b[i-1]
}
df
A b
1 5 0.0000000
2 10 0.7142857
3 15 1.7346939
4 20 3.0393586
5 25 4.6079758
This doesn't give the exact values given in the expected answer, but it's close enough that I've assumed you made a transcription mistake. Note that we have to assume that we can take the NA in df$b[1] as being zero or we get NA all the way down.
If you have heaps of data or need to do this a bunch of time the speed could be improved by implementing the code in C++ and calling it from R.
The second method uses the R function sapply
The form you present the problem in
is recursive, which makes it impossible to vectorise, however we can do some maths and find that it is equivalent to
We can then write a function which calculates b_i and use sapply to calculate each element
calc_b <- function(n,A){
(1/14)*sum((13/14)^(n-1:n)*A[1:n])
}
df2 <- data.frame(A = seq(10,25,5))
df2$b <- sapply(seq_along(df2$A), calc_b, df2$A)
df2
A b
1 10 0.7142857
2 15 1.7346939
3 20 3.0393586
4 25 4.6079758
Note: We need to drop the first row (where A = 5) in order for the calculation to perform correctly.
Horrible title question, but this is what I am trying to achieve. For Table1 I want to add the Column "BETWEEN", verifying if the "POSITION" falls between any of the "START" and "STOP" values for the corresponding "BIN" in Table2.
Table1. BIN names (character) and POSITION in BIN (numeric):
BIN POSITION
1 12
1 52
1 86
7 6
7 22
X 112
X 139
MT 3
MT 26
Table2: BIN name (character) and START and STOP positions (numeric)
BIN START STOP
1 2 64
1 90 110
7 20 100
7 105 200
X 1 5
MT 1 1000
And the desired result - Table 1 with "BETWEEN":
CHROM POSITION BETWEEN
1 12 TRUE
1 52 TRUE
1 86 FALSE
7 6 FALSE
7 22 TRUE
X 112 FALSE
X 139 FALSE
MT 3 TRUE
MT 26 TRUE
My Table 1 has about 4,000,000 rows, and Table 2 about 500,000 rows, and anything I came up with was very slow.
As an example of bigger tables, use the following:
positions <- seq(1,100000,10)
bins <- c("A","B","C","D","E","F","G","H","I","J")
tab1 <- data.table(bin = rep(bins,1,each=length(positions)), pos = rep(positions,10))
tab2 <- data.table(bin = rep(bins,1,each=2000), start = seq(5,100000,50), stop = start+25)
The desired output would be:
tab1
bin pos between
1: A 1 FALSE
2: A 11 TRUE
3: A 21 TRUE
4: A 31 FALSE
5: A 41 FALSE
The following method requires that for a given bin, the bins are mutually exclusive. (e.g. you cant have bin A with bounds 1-5 and another bin A with bounds 4-8.) Also, I modified your example a bit.
positions <- seq(1,100000,10)
bins <- c("A","B","C","D","E","F","G","H","I","J")
tab1 <- data.table(bin = rep(bins,1,each=length(positions)), pos = rep(positions,10))
setkey(tab1,"bin","pos")
tab2 <- data.table(bin = rep(bins,1,each=2000), start = seq(5,100000,50))
tab2[, end := start+25]
tab2[,pos:=start]
setkey(tab2,"bin","pos")
x<-tab2[tab1, roll=TRUE, nomatch=0]
tab2[,pos:=end]
setkey(tab2,"bin","pos")
y<-tab2[tab1, roll=-Inf, nomatch=0]
setkey(x,"bin","pos","start")
setkey(y,"bin","pos","start")
inBin<-x[y,nomatch=0]
inBin[, between:=TRUE]
setkey(tab1,"bin","pos")
setkey(inBin,"bin","pos")
result<-inBin[,list(bin,pos,between)][tab1]
result[is.na(between), between:=FALSE]
I don't have the time to explain my solution in depth right now. Instead I'll take the cheap way out and refer you to research the roll parameter of data.table. The basic methodology above is that I'm joining tab1 and tab2, rolling pos forward to the nearest end bound. Then I join tab1 and tab2, rolling pos backward to the nearest start bound. Then I do an inner join on the those two sets, giving me all rows in tab1 which fall inside the bounds of a bin. From that point, it's just grunt work.
Most straightforward approach is to nest the matching loops I think. You may be need to handle factors slightly differently. I haven't tested to see what happens if it does not find a bin match.
BIN <- c("1","1","1","7","7","X","X","MT","MT")
POSITION <- c(12,52,86,6,22,112,139,3,26)
npos <- length(POSITION)
BETWEEN <- vector(mode="logical",length=npos)
tab1 <- as.data.frame(cbind(BIN,POSITION))
BIN2 <- c("1","1","7","7","X","MT")
START <- c(2,90,20,105,1,1)
STOP <- c(64,110,100,200,5,1000)
tab2 <- as.data.frame(cbind(BIN2,START,STOP))
bins <- unique(tab1$BIN)
for(bin in bins){
#print(paste("bin=",bin))
t1.bin.matches <- which(tab1$BIN==bin)
t2.bin.compares <- which(tab2$BIN2==bin)
#print(t1.bin.matches)
#print(t2.bin.compares)
for(match in t1.bin.matches){
between = FALSE
candidate = as.numeric(as.vector(tab1$POSITION)[match])
for(compare in t2.bin.compares){
comp.start <- as.numeric(as.vector(tab2$START)[compare])
comp.stop <- as.numeric(as.vector(tab2$STOP)[compare])
if(candidate>=comp.start&&candidate<=comp.stop){
between = TRUE
break
}
}
#print(paste(comp.start,candidate,comp.stop,between))
BETWEEN[match] = between
}
}
tab1 <- as.data.frame(cbind(tab1,BETWEEN))
tab1
Make sure your BIN columns are character, POSITION, START, END are numeric.
Table1$BIN = as.character(Table1$BIN)
Table1$POSITION = as.numeric(Table1$POSITION)
Table2$BIN = as.character(Table2$BIN)
Table2$START = as.numeric(Table2$START)
Table2$STOP = as.numeric(Table2$STOP)
Convert your data.frame to library(data.table) because the code below might be slow.
Table1 = as.data.table(Table1)
Table2 = as.data.table(Table2)
Generate desired output
z = apply(Table1, 1, function(x) {nrow(Table2[(as.numeric(x[2])>START) & (as.numeric(x[2])<STOP) & (BIN == as.character(x[1])),])>0})
cbind(Table1, z)
Old function is z(), new is y(). With the sample Table1, Table2, the new function is 30% faster. I don't know how this advantage will scale as nrow increases, but I'm guessing this scaling will be very positive. Let me know.
z = function(a){apply(Table1, 1, function(x) {z = subset(Table2, Table2$BIN == as.character(x[1]))
any(as.numeric(x[2])>z$START & as.numeric(x[2])<z$STOP)})}
y = function(a){apply(Table1, 1, function(x) {nrow(Table2[(as.numeric(x[2])>START) & (as.numeric(x[2])<STOP) & (BIN == as.character(x[1])),])>0})}
microbenchmark(z(), y(), times = 1000L)
expr min lq median uq max neval
z() 1168.283 1219.793 1237.791 1276.267 3481.576 1000
y() 809.575 848.052 863.257 885.909 1683.383 1000
edit: you might need to muck with the as.numeric, and as.character in the subsetting. I lost the data.table I created earlier and directly used the answer above's data.frame.
I am trying to simulate a simple game where you spin a spinner, labeled 1-5, and then progress on until you pass the finish line (spot 50). I am a bit new to R and have been working on this for a while searching for answers. When I run the code below, it doesn't add the numbers in sequence, it returns a list of my 50 random spins and their value. How do I get this to add the spins on top of each other, then stop once => 50?
SpacesOnSpinner<-(seq(1,5,by=1))
N<-50
L1<-integer(N)
for (i in 1:N){
takeaspin<-sample(SpacesOnSpinner,1,replace=TRUE)
L1[i]<-L1[i]+takeaspin
}
This is a good use-case for replicate. I'm not sure if you have to use a for loop, but you could do this instead (replicate is a loop too):
SpacesOnSpinner<-(seq(1,5,by=1))
N<-10
cumsum( replicate( N , sample(SpacesOnSpinner,1,replace=TRUE) ) )
#[1] 5 10 14 19 22 25 27 29 30 33
However, since you have a condition which you want to break on, perhaps the other answer with a while condition is exactly what you need in this case (people will tell you they are bad in R, but they have their uses). Using this method, you can see how many spins it took you to get past 50 by a simple subset afterwards (but you will not know in advance how many spins it will take, but at most it will be 50!):
N<-50
x <- cumsum( replicate( N , sample(5,1) ) )
# Value of accumulator at each round until <= 50
x[ x < 50 ]
#[1] 5 6 7 8 12 16 21 24 25 29 33 34 36 38 39 41 42 44 45 49
# Number of spins before total <= 50
length(x[x < 50])
[1] 20
Here is another interesting way to simulate your game, using a recursive function.
spin <- function(outcomes = 1:5, start = 0L, end = 50L)
if (start <= end)
c(got <- sample(outcomes, 1), Recall(outcomes, start + got, end))
spin()
# [1] 5 4 4 5 1 5 3 2 3 4 4 1 5 4 3
Although elegant, it won't be as fast as an improved version of #Simon's solution that makes a single call to sample, as suggested by #Viktor:
spin <- function(outcomes = 1:5, end = 50L) {
max.spins <- ceiling(end / min(outcomes))
x <- sample(outcomes, max.spins, replace = TRUE)
head(x, match(TRUE, cumsum(x) >= end))
}
spin()
# [1] 3 5 2 3 5 2 2 5 1 2 1 5 5 5 2 4
For your ultimate goal (find the probability of one person being in the lead for the entire game), it is debatable whether while will be more efficient or not: a while loop is certainly slower, but you may benefit from the possibility of exiting early as the lead switches from one player to the other. Both approaches are worth testing.
You can use a while statement and a variable total for keeping track of the sum:
total <- 0
while(total <= 50){
takeaspin<-sample(SpacesOnSpinner,1,replace=TRUE)
total <- takeaspin + total
}
print (total)