Looping over 16 numbers, but excluding one each time - r

Using expression the following expression I want to compute the influence of each data point in the an election forecast data set (see bottom). My idea is to loop through the expression 16 times and print the result, but for each time I loop through leave on x_1 out to see how each of them influences the result. But I have no idea how to make this loop in R.
The expression is:
LaTeX
$$ \hat{b} = \frac{\sum_{i=1}^{n} ({x_i}-{\bar{x}){y_i}}}{\sum_{i=1}^{n} ({x_i}-{\bar{x})}^2} $$
And in R
betahat<- (sum(data$growth)-mean(data$growth))*data$vote/(sum(data$growth)-mean(data$growth))^2
print(betahat)
And the data is this
data <- read.table("https://raw.githubusercontent.com/avehtari/ROS-Examples/master/ElectionsEconomy/data/hibbs.dat", header = TRUE)
Expected functioning:
0 1 2 0 x x 0 1 2
1 2 4 first loop 1 2 4 second loop 1 x x etc.
2 3 6 ---> 2 3 6 ---> 2 3 6 --->
3 4 8 3 4 8 3 4 8
4 5 10 4 5 10 4 5 10
The first output should be something like
[1] 1.566974 2.029337 1.753535 2.155116 1.742644 2.170927 1.719807 1.570487 2.078876
[10] 1.895125 1.635485 1.923232 1.766184 1.800264 1.627404 1.826965

Related

Generate three level dependency in case a verb is attached with non verb in dependency parsing

I am using dependency parsing for a use case in R with the corenlp package. However, I need to tweak the dataframe for a specific use case.
I need a dataframe where I have three columns. I have used the below code to reach till the dependency tree.
devtools::install_github("statsmaths/coreNLP")
coreNLP::downloadCoreNLP()
initCoreNLP()
inp_cl = "generate odd numbers from column one and print."
output = annotateString(inp_cl)
dc = getDependency(output)
sentence governor dependent type governorIdx dependentIdx govIndex depIndex
1 1 ROOT generate root 0 1 NA 1
2 1 numbers odd amod 3 2 3 2
3 1 generate numbers dobj 1 3 1 3
4 1 column from case 5 4 5 4
5 1 generate column nmod:from 1 5 1 5
6 1 column one nummod 5 6 5 6
7 1 column and cc 5 7 5 7
8 1 generate print nmod:from 1 8 1 8
9 1 column print conj:and 5 8 5 8
10 1 generate . punct 1 7 1 10
Using POS tagging with the following code, I ended up with the following data frame.
ps = getToken(output)
ps = ps[,c(1,2,7,3)]
colnames(dc)[8] = "id"
dp = merge(dc, ps[,c("sentence","id","POS")],
by.x=c("sentence","governorIdx"),by.y = c("sentence","id"),all.x = T)
dp = merge(dp, ps[,c("sentence","id","POS")],
by.x=c("sentence","dependentIdx"),by.y = c("sentence","id"),all.x = T)
colnames(dp)[9:10] = c("POS_gov","POS_dep")
sentence dependentIdx governorIdx governor dependent type govIndex id POS_gov POS_dep
1 1 1 0 ROOT generate root NA 1 <NA> VB
2 1 2 3 numbers odd amod 3 2 NNS JJ
3 1 3 1 generate numbers dobj 1 3 VB NNS
4 1 4 5 column from case 5 4 NN IN
5 1 5 1 generate column nmod:from 1 5 VB NN
6 1 6 5 column one nummod 5 6 NN CD
7 1 7 5 column and cc 5 7 NN CC
8 1 8 1 generate print nmod:from 1 8 VB NN
9 1 8 5 column print conj:and 5 8 NN NN
10 1 9 1 generate . punct 1 9 VB .
In case a verb(action word) is attached to a non-verb(non action word), but the non-verb(non-action word) is connected to other non-verb(non-action words) then one row should indicate the entire connection. Eg: generate is a verb connected to numbers and numbers is a non verb connected to odd.
So the intended data frame needs to be
Topic1 Topic2 Action
numbers odd generate
column from generate
column one generate
column and generate
column from print
column one print
column and print
. generate
First you'll need to have your dependency tree tag print as a verb, rather than a noun.
Try using a sentence with two independent clauses, and see if the root of the second independent clause is tagged as such.
If so, it's a simple walk through the governoridx column. If not, you'll need to address the mechanics of your dependency tree generator.

In R, generating every possible solution to a model, based on constraints

In R, I’m trying to generate a matrix that shows results from a model and the values used to solve them- all of which are constrained. Every possible solution. An example model:
Model= a^2+b^2+c^2+d^2
Where:
20≤Model≤30
a=1
2 ≤b ≤3
2 ≤c ≤3
3 ≤d ≤4
I’d like the output to look like this:
[a] [b] [c] [d] [Model]
[1] 1 3 2 3 23
[2] 1 2 2 4 25
[3] 1 3 3 3 28
[4] 1 2 3 3 23
Order doesn't matter. I just want the full permutation of feasible [integer] values. Any packages or help you could point my way?
In my example case, I want to generate all possible inputs(a,b,c,d) that hold valid, based on the parameters I set. I only want values from my output equation (Model) between 20 and 30. In this case, only 4 solutions are possible based on the criteria I'm setting.
Assuming you're only looking for integer solutions, you can use expand.grid()
dd <- expand.grid(a=1, b=2:3, c=2:3, d=3:4)
m <- with(dd, a^2+b^2+c^2+d^2)
inside <- function(x, a,b) a<=x & x<=b
cbind(dd, m)[inside(m, 20, 30),]
# a b c d m
# 2 1 3 2 3 23
# 3 1 2 3 3 23
# 4 1 3 3 3 28
# 5 1 2 2 4 25
# 6 1 3 2 4 30
# 7 1 2 3 4 30
(you said you want values <=30 but you seem to have left out the 30's in your example, you can change the inside() function of you want an open interval)

Divide each rows by a different number

I've looked on the internet but I haven found the answer that I'm looking for, but shure it's out there...
I've a data frame, and I want to divide (or any other operation) every cell of a row by a value that it's placed in the second column of my data frame.
So first row from col3 to last col, divide each cell by the value of col2 of that certain row, and so on for every single row.
I have solved this by using a For loop, col2 (delta) it's now a vector, and col3 to end it's a data.frame (mu). The results are append to a new data frame by using rbind.
The question is; I'm pretty sure that this can be done by using the function apply, sapply or similar, but I have not gotten the results that I've been looking so far (not the good ones as I do with the loop for). ¿How can I do it without using a loop for?
Loop for I've been using so far.
In resume.
I want to divide each mu by the delta value of it's own row.
for (i in 1:(dim(mu)[1])){
RA_row <- mu[i,]/delta[i]
RA <- rbind(RA, RA_row)
}
transcript delta mu_5 mu_15 mu_25 mu_35 mu_45 mu_55 mu_65
1 YAL001C 0.066702720 2.201787e-01 1.175731e-01 2.372506e-01 0.139281317 0.081723456 1.835414e-01 1.678318e-01
2 YAL002W 0.106000180 3.685822e-01 1.326865e-01 2.887973e-01 0.158207858 0.193476082 1.867039e-01 1.776946e-01
3 YAL003W 0.022119345 2.271518e+00 2.390637e+00 1.651997e+00 3.802739732 2.733559839 2.772454e+00 3.571712e+00
Thanks
It appears as though you want just:
mu2 <- mu[-(1:2)]/mu[[2]]
# same as mu[-(1:2), ]/mu[['delta']]
That should produce a new dataframe with the division by row. Somewhat more dangerous would be to do the division "in place".
mu[-(1:2)] <- mu[-(1:2)]/mu[[2]]
> mu <- data.frame(a=1,b=1:10, c=rnorm(10), d=rnorm(10) )
> mu
a b c d
1 1 1 -1.91435943 0.45018710
2 1 2 1.17658331 -0.01855983
3 1 3 -1.66497244 -0.31806837
4 1 4 -0.46353040 -0.92936215
5 1 5 -1.11592011 -1.48746031
6 1 6 -0.75081900 -1.07519230
7 1 7 2.08716655 1.00002880
8 1 8 0.01739562 -0.62126669
9 1 9 -1.28630053 -1.38442685
10 1 10 -1.64060553 1.86929062
> (mu2 <- mu[-(1:2)]/mu[[2]])
c d
1 -1.914359426 0.450187101
2 0.588291656 -0.009279916
3 -0.554990812 -0.106022792
4 -0.115882600 -0.232340537
5 -0.223184021 -0.297492062
6 -0.125136500 -0.179198716
7 0.298166649 0.142861258
8 0.002174452 -0.077658337
9 -0.142922281 -0.153825205
10 -0.164060553 0.186929062
> (mu[-(1:2)] <- mu[-(1:2)]/mu[[2]] )
> mu
a b c d
1 1 1 -1.914359426 0.450187101
2 1 2 0.588291656 -0.009279916
3 1 3 -0.554990812 -0.106022792
4 1 4 -0.115882600 -0.232340537
5 1 5 -0.223184021 -0.297492062
6 1 6 -0.125136500 -0.179198716
7 1 7 0.298166649 0.142861258
8 1 8 0.002174452 -0.077658337
9 1 9 -0.142922281 -0.153825205
10 1 10 -0.164060553 0.186929062

R table function

If I have a vector numbers <- c(1,1,2,4,2,2,2,2,5,4,4,4), and I use 'table(numbers)', I get
names 1 2 4 5
counts 2 5 4 1
What if I want it to include 3 also or generally, all numbers from 1:max(numbers) even if they are not represented in numbers. Thus, how would I generate an output as such:
names 1 2 3 4 5
counts 2 5 0 4 1
If you want R to add up numbers that aren't there, you should create a factor and explicitly set the levels. table will return a count for each level.
table(factor(numbers, levels=1:max(numbers)))
# 1 2 3 4 5
# 2 5 0 4 1
For this particular example (positive integers), tabulate would also work:
numbers <- c(1,1,2,4,2,2,2,2,5,4,4,4)
tabulate(numbers)
# [1] 2 5 0 4 1

Calculating the occurrences of numbers in the subsets of a data.frame

I have a data frame in R which is similar to the follows. Actually my real ’df’ dataframe is much bigger than this one here but I really do not want to confuse anybody so that is why I try to simplify things as much as possible.
So here’s the data frame.
id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
a <-c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3)
b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2)
c <-c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2)
d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2)
e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,3)
df <-data.frame(id,a,b,c,d,e)
df
Basically what I would like to do is to get the occurrences of numbers for each column (a,b,c,d,e) and for each id group (1,2,3) (for this latter grouping see my column ’id’).
So, for column ’a’ and for id number ’1’ (for the latter see column ’id’) the code would be something like this:
as.numeric(table(df[1:10,2]))
##The results are:
[1] 3 7
Just to briefly explain my results: in column ’a’ (and regarding only those records which have number ’1’ in column ’id’) we can say that number '1' occured 3 times and number '3' occured 7 times.
Again, just to show you another example. For column ’a’ and for id number ’2’ (for the latter grouping see again column ’id’):
as.numeric(table(df[11:20,2]))
##After running the codes the results are:
[1] 4 3 3
Let me explain a little again: in column ’a’ and regarding only those observations which have number ’2’ in column ’id’) we can say that number '1' occured 4 times, number '2' occured 3 times and number '3' occured 3 times.
So this is what I would like to do. Calculating the occurrences of numbers for each custom-defined subsets (and then collecting these values into a data frame). I know it is not a difficult task but the PROBLEM is that I’m gonna have to change the input ’df’ dataframe on a regular basis and hence both the overall number of rows and columns might change over time…
What I have done so far is that I have separated the ’df’ dataframe by columns, like this:
for (z in (2:ncol(df))) assign(paste("df",z,sep="."),df[,z])
So df.2 will refer to df$a, df.3 will equal df$b, df.4 will equal df$c etc. But I’m really stuck now and I don’t know how to move forward…
Is there a proper, ”automatic” way to solve this problem?
How about -
> library(reshape)
> dftab <- table(melt(df,'id'))
> dftab
, , value = 1
variable
id a b c d e
1 3 8 2 2 4
2 4 6 3 2 4
3 4 2 1 5 1
, , value = 2
variable
id a b c d e
1 0 1 4 3 3
2 3 3 3 6 2
3 1 4 5 3 4
, , value = 3
variable
id a b c d e
1 7 1 4 5 3
2 3 1 4 2 4
3 5 4 4 2 5
So to get the number of '3's in column 'a' and group '1'
you could just do
> dftab[3,'a',1]
[1] 4
A combination of tapply and apply can create the data you want:
tapply(df$id,df$id,function(x) apply(df[id==x,-1],2,table))
However, when a grouping doesn't have all the elements in it, as in 1a, the result will be a list for that id group rather than a nice table (matrix).
$`1`
$`1`$a
1 3
3 7
$`1`$b
1 2 3
8 1 1
$`1`$c
1 2 3
2 4 4
$`1`$d
1 2 3
2 3 5
$`1`$e
1 2 3
4 3 3
$`2`
a b c d e
1 4 6 3 2 4
2 3 3 3 6 2
3 3 1 4 2 4
$`3`
a b c d e
1 4 2 1 5 1
2 1 4 5 3 4
3 5 4 4 2 5
I'm sure someone will have a more elegant solution than this, but you can cobble it together with a simple function and dlply from the plyr package.
ColTables <- function(df) {
counts <- list()
for(a in names(df)[names(df) != "id"]) {
counts[[a]] <- table(df[a])
}
return(counts)
}
results <- dlply(df, "id", ColTables)
This gets you back a list - the first "layer" of the list will be the id variable; the second the table results for each column for that id variable. For example:
> results[['2']]['a']
$a
1 2 3
4 3 3
For id variable = 2, column = a, per your above example.
A way to do it is using the aggregate function, but you have to add a column to your dataframe
> df$freq <- 0
> aggregate(freq~a+id,df,length)
a id freq
1 1 1 3
2 3 1 7
3 1 2 4
4 2 2 3
5 3 2 3
6 1 3 4
7 2 3 1
8 3 3 5
Of course you can write a function to do it, so it's easier to do it frequently, and you don't have to add a column to your actual data frame
> frequency <- function(df,groups) {
+ relevant <- df[,groups]
+ relevant$freq <- 0
+ aggregate(freq~.,relevant,length)
+ }
> frequency(df,c("b","id"))
b id freq
1 1 1 8
2 2 1 1
3 3 1 1
4 1 2 6
5 2 2 3
6 3 2 1
7 1 3 2
8 2 3 4
9 3 3 4
You didn't say how you'd like the data. The by function might give you the output you like.
by(df, df$id, function(x) lapply(x[,-1], table))

Resources