In R, generating every possible solution to a model, based on constraints - r

In R, I’m trying to generate a matrix that shows results from a model and the values used to solve them- all of which are constrained. Every possible solution. An example model:
Model= a^2+b^2+c^2+d^2
Where:
20≤Model≤30
a=1
2 ≤b ≤3
2 ≤c ≤3
3 ≤d ≤4
I’d like the output to look like this:
[a] [b] [c] [d] [Model]
[1] 1 3 2 3 23
[2] 1 2 2 4 25
[3] 1 3 3 3 28
[4] 1 2 3 3 23
Order doesn't matter. I just want the full permutation of feasible [integer] values. Any packages or help you could point my way?
In my example case, I want to generate all possible inputs(a,b,c,d) that hold valid, based on the parameters I set. I only want values from my output equation (Model) between 20 and 30. In this case, only 4 solutions are possible based on the criteria I'm setting.

Assuming you're only looking for integer solutions, you can use expand.grid()
dd <- expand.grid(a=1, b=2:3, c=2:3, d=3:4)
m <- with(dd, a^2+b^2+c^2+d^2)
inside <- function(x, a,b) a<=x & x<=b
cbind(dd, m)[inside(m, 20, 30),]
# a b c d m
# 2 1 3 2 3 23
# 3 1 2 3 3 23
# 4 1 3 3 3 28
# 5 1 2 2 4 25
# 6 1 3 2 4 30
# 7 1 2 3 4 30
(you said you want values <=30 but you seem to have left out the 30's in your example, you can change the inside() function of you want an open interval)

Related

Looping over 16 numbers, but excluding one each time

Using expression the following expression I want to compute the influence of each data point in the an election forecast data set (see bottom). My idea is to loop through the expression 16 times and print the result, but for each time I loop through leave on x_1 out to see how each of them influences the result. But I have no idea how to make this loop in R.
The expression is:
LaTeX
$$ \hat{b} = \frac{\sum_{i=1}^{n} ({x_i}-{\bar{x}){y_i}}}{\sum_{i=1}^{n} ({x_i}-{\bar{x})}^2} $$
And in R
betahat<- (sum(data$growth)-mean(data$growth))*data$vote/(sum(data$growth)-mean(data$growth))^2
print(betahat)
And the data is this
data <- read.table("https://raw.githubusercontent.com/avehtari/ROS-Examples/master/ElectionsEconomy/data/hibbs.dat", header = TRUE)
Expected functioning:
0 1 2 0 x x 0 1 2
1 2 4 first loop 1 2 4 second loop 1 x x etc.
2 3 6 ---> 2 3 6 ---> 2 3 6 --->
3 4 8 3 4 8 3 4 8
4 5 10 4 5 10 4 5 10
The first output should be something like
[1] 1.566974 2.029337 1.753535 2.155116 1.742644 2.170927 1.719807 1.570487 2.078876
[10] 1.895125 1.635485 1.923232 1.766184 1.800264 1.627404 1.826965

Sorting data frame by column, adding index within group

This question describes the setting for my question pretty well.
Instead of a second value however, I have a factor called algorithm. My data frame looks like the following (note the possibility of multiplicity of values even within their group):
algorithm <- c("global", "distributed", "distributed", "none", "global", "global", "distributed", "none", "none")
v <- c(5, 2, 6, 7, 3, 1, 10, 2, 2)
df <- data.frame(algorithm, v)
df
algorithm v
1 global 5
2 distributed 2
3 distributed 6
4 none 7
5 global 3
6 global 1
7 distributed 10
8 none 2
9 none 2
I would like to sort the dataframe by v but get the ordering position for every entry with respect to its group (algorithm). This position should then be added to the original data frame (so I don't need to rearrange it) because I would like to plot the calculated position as x and the value as y using a ggplot (grouped by algorithm, e.g. every algorithm is one set of points).
So the result should look like this:
algorithm v groupIndex
1 global 5 3
2 distributed 2 1
3 distributed 6 2
4 none 7 3
5 global 3 2
6 global 1 1
7 distributed 10 3
8 none 2 1
9 none 2 2
So far I know I can order the data by algorithm first and then by value or the other way round. I guess in a second step I would have to calculate the index within each group? Is there an easy way to do that?
df[order(df$algorithm, df$v), ]
algorithm v
2 distributed 2
3 distributed 6
7 distributed 10
6 global 1
5 global 3
1 global 5
8 none 2
9 none 2
4 none 7
Edit: It is not guaranteed, that there is the same amount of entries for each group!
A double application of order in each group should cover it:
ave(df$v, df$algorithm, FUN=function(x) order(order(x)) )
#[1] 3 1 2 3 2 1 3 1 2
Which is also equivalent to:
ave(df$v, df$algorithm, FUN=function(x) rank(x,ties.method="first") )
#[1] 3 1 2 3 2 1 3 1 2
, which in turn means you can take advantage of frank from data.table if you are concerned about speed:
setDT(df)[, grpidx := frank(v,ties.method="first"), by=algorithm]
df
# algorithm v grpidx
#1: global 5 3
#2: distributed 2 1
#3: distributed 6 2
#4: none 7 3
#5: global 3 2
#6: global 1 1
#7: distributed 10 3
#8: none 2 1
#9: none 2 2
One way would be the following. You can order v values for each group by using with_order(), I think. You can assign ranks using row_number() in the function. In this way, you can skip a step to arrange your data for each group as you tried with order().
library(dplyr)
group_by(df, algorithm) %>%
mutate(groupInd = with_order(order_by = v, fun = row_number, x = v))
# algorithm v groupInd
# <fctr> <int> <int>
#1 global 5 3
#2 distributed 2 1
#3 distributed 6 2
#4 none 7 3
#5 global 3 2
#6 global 1 1
#7 distributed 10 3
#8 none 2 1
#9 none 2 2

Merge data frames for Cohen's kappa

I'm trying to analyze some date using R but I'm not very familiar with R (yet) and therefore I'm totally stuck.
What I try to do is manipulate my input data so I can use it to calculate Cohen's Kappa.
Now the problem is, that for rater_1, I have several ratings for some of the items and I need to select one. If rater_1 has given the same rate on an item as rater_2, then this rating should be chosen, if not any rating of the list can be used.
I tried
unique(merge(rater_1, rater_2, all.x=TRUE))
which brings me close, but if the ratings between the two raters diverge, only one is kept.
So, my question is, how do I get from
item rating_1
1 3
2 5
3 4
item rating_2
1 2
1 3
2 4
2 1
2 2
3 4
3 2
to
item rating_1 rating_2
1 3 3
2 5 4
3 4 4
?
There are some fancy ways to do this, but I thought it might be helpful to combine a few basic techniques to accomplish this task. Usually, in your question, you should include some easy way to generate your data, like this:
# Create some sample data
set.seed(1)
id<-rep(1:50)
rater_1<-sample(1:5,50,replace=TRUE)
df1<-data.frame(id,rater_1)
id<-rep(1:50,each=2)
rater_2<-sample(1:5,100,replace=TRUE)
df2<-data.frame(id,rater_2)
Now, here is one simple technique for doing this.
# Merge together the data frames.
all.merged<-merge(df1,df2)
# id rater_1 rater_2
# 1 1 2 3
# 2 1 2 5
# 3 2 2 3
# 4 2 2 2
# 5 3 3 1
# 6 3 3 1
# Find the ones that are equal.
same.rating<-all.merged[all.merged$rater_2==all.merged$rater_1,]
# Consider id 44, sometimes they match twice.
# So remove duplicates.
same.rating<-same.rating[!duplicated(same.rating),]
# Find the ones that never matched.
not.same.rating<-all.merged[!(all.merged$id %in% same.rating$id),]
# Pick one. I chose to pick the maximum.
picked.rating<-aggregate(rater_2~id+rater_1,not.same.rating,max)
# Stick the two together.
result<-rbind(same.rating,picked.rating)
result<-result[order(result$id),] # Sort
# id rater_1 rater_2
# 27 1 2 5
# 4 2 2 2
# 33 3 3 1
# 44 4 5 3
# 281 5 2 4
# 11 6 5 5
A fancy way to do this would be like this:
same.or.random<-function(x) {
matched<-which.min(x$rater_1==x$rater_2)
if(length(matched)>0) x[matched,]
else x[sample(1:nrow(x),1),]
}
do.call(rbind,by(merge(df1,df2),id,same.or.random))

Create sequence of repeated values, in sequence?

I need a sequence of repeated numbers, i.e. 1 1 ... 1 2 2 ... 2 3 3 ... 3 etc. The way I implemented this was:
nyear <- 20
names <- c(rep(1,nyear),rep(2,nyear),rep(3,nyear),rep(4,nyear),
rep(5,nyear),rep(6,nyear),rep(7,nyear),rep(8,nyear))
which works, but is clumsy, and obviously doesn't scale well.
How do I repeat the N integers M times each in sequence?
I tried nesting seq() and rep() but that didn't quite do what I wanted.
I can obviously write a for-loop to do this, but there should be an intrinsic way to do this!
You missed the each= argument to rep():
R> n <- 3
R> rep(1:5, each=n)
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
R>
so your example can be done with a simple
R> rep(1:8, each=20)
Another base R option could be gl():
gl(5, 3)
Where the output is a factor:
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
Levels: 1 2 3 4 5
If integers are needed, you can convert it:
as.numeric(gl(5, 3))
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
For your example, Dirk's answer is perfect. If you instead had a data frame and wanted to add that sort of sequence as a column, you could also use group from groupdata2 (disclaimer: my package) to greedily divide the datapoints into groups.
# Attach groupdata2
library(groupdata2)
# Create a random data frame
df <- data.frame("x" = rnorm(27))
# Create groups with 5 members each (except last group)
group(df, n = 5, method = "greedy")
x .groups
<dbl> <fct>
1 0.891 1
2 -1.13 1
3 -0.500 1
4 -1.12 1
5 -0.0187 1
6 0.420 2
7 -0.449 2
8 0.365 2
9 0.526 2
10 0.466 2
# … with 17 more rows
There's a whole range of methods for creating this kind of grouping factor. E.g. by number of groups, a list of group sizes, or by having groups start when the value in some column differs from the value in the previous row (e.g. if a column is c("x","x","y","z","z") the grouping factor would be c(1,1,2,3,3).

Calculating the occurrences of numbers in the subsets of a data.frame

I have a data frame in R which is similar to the follows. Actually my real ’df’ dataframe is much bigger than this one here but I really do not want to confuse anybody so that is why I try to simplify things as much as possible.
So here’s the data frame.
id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
a <-c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
c <-c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,3)
df <-data.frame(id,a,b,c,d,e)
df
Basically what I would like to do is to get the occurrences of numbers for each column (a,b,c,d,e) and for each id group (1,2,3) (for this latter grouping see my column ’id’).
So, for column ’a’ and for id number ’1’ (for the latter see column ’id’) the code would be something like this:
as.numeric(table(df[1:10,2]))
##The results are:
[1] 3 7
Just to briefly explain my results: in column ’a’ (and regarding only those records which have number ’1’ in column ’id’) we can say that number '1' occured 3 times and number '3' occured 7 times.
Again, just to show you another example. For column ’a’ and for id number ’2’ (for the latter grouping see again column ’id’):
as.numeric(table(df[11:20,2]))
##After running the codes the results are:
[1] 4 3 3
Let me explain a little again: in column ’a’ and regarding only those observations which have number ’2’ in column ’id’) we can say that number '1' occured 4 times, number '2' occured 3 times and number '3' occured 3 times.
So this is what I would like to do. Calculating the occurrences of numbers for each custom-defined subsets (and then collecting these values into a data frame). I know it is not a difficult task but the PROBLEM is that I’m gonna have to change the input ’df’ dataframe on a regular basis and hence both the overall number of rows and columns might change over time…
What I have done so far is that I have separated the ’df’ dataframe by columns, like this:
for (z in (2:ncol(df))) assign(paste("df",z,sep="."),df[,z])
So df.2 will refer to df$a, df.3 will equal df$b, df.4 will equal df$c etc. But I’m really stuck now and I don’t know how to move forward…
Is there a proper, ”automatic” way to solve this problem?
How about -
> library(reshape)
> dftab <- table(melt(df,'id'))
> dftab
, , value = 1
variable
id a b c d e
1 3 8 2 2 4
2 4 6 3 2 4
3 4 2 1 5 1
, , value = 2
variable
id a b c d e
1 0 1 4 3 3
2 3 3 3 6 2
3 1 4 5 3 4
, , value = 3
variable
id a b c d e
1 7 1 4 5 3
2 3 1 4 2 4
3 5 4 4 2 5
So to get the number of '3's in column 'a' and group '1'
you could just do
> dftab[3,'a',1]
[1] 4
A combination of tapply and apply can create the data you want:
tapply(df$id,df$id,function(x) apply(df[id==x,-1],2,table))
However, when a grouping doesn't have all the elements in it, as in 1a, the result will be a list for that id group rather than a nice table (matrix).
$`1`
$`1`$a
1 3
3 7
$`1`$b
1 2 3
8 1 1
$`1`$c
1 2 3
2 4 4
$`1`$d
1 2 3
2 3 5
$`1`$e
1 2 3
4 3 3
$`2`
a b c d e
1 4 6 3 2 4
2 3 3 3 6 2
3 3 1 4 2 4
$`3`
a b c d e
1 4 2 1 5 1
2 1 4 5 3 4
3 5 4 4 2 5
I'm sure someone will have a more elegant solution than this, but you can cobble it together with a simple function and dlply from the plyr package.
ColTables <- function(df) {
counts <- list()
for(a in names(df)[names(df) != "id"]) {
counts[[a]] <- table(df[a])
}
return(counts)
}
results <- dlply(df, "id", ColTables)
This gets you back a list - the first "layer" of the list will be the id variable; the second the table results for each column for that id variable. For example:
> results[['2']]['a']
$a
1 2 3
4 3 3
For id variable = 2, column = a, per your above example.
A way to do it is using the aggregate function, but you have to add a column to your dataframe
> df$freq <- 0
> aggregate(freq~a+id,df,length)
a id freq
1 1 1 3
2 3 1 7
3 1 2 4
4 2 2 3
5 3 2 3
6 1 3 4
7 2 3 1
8 3 3 5
Of course you can write a function to do it, so it's easier to do it frequently, and you don't have to add a column to your actual data frame
> frequency <- function(df,groups) {
+ relevant <- df[,groups]
+ relevant$freq <- 0
+ aggregate(freq~.,relevant,length)
+ }
> frequency(df,c("b","id"))
b id freq
1 1 1 8
2 2 1 1
3 3 1 1
4 1 2 6
5 2 2 3
6 3 2 1
7 1 3 2
8 2 3 4
9 3 3 4
You didn't say how you'd like the data. The by function might give you the output you like.
by(df, df$id, function(x) lapply(x[,-1], table))

Resources