How to calculate perplexity of BERTopic? - bert-language-model

Is there a way to calculate the perplexity of BERTopic? I am unable to find any such thing in the BERTopic library and in other places.

Related

Estimating Robust Standard Errors from Covariate Balanced Propensity Score Output

I'm using the Covariate Balancing Propensity Score (CBPS) package and I want to estimate robust standard errors for my ATT results that incorporate the weights. The MatchIt and twang tutorials both recommend using the survey package to incorporate weights into the estimate of robust standard errors, and it seems to work:
design.CBPS <- svydesign(ids=~1, weights=CBPS.object$weights, data=SUCCESS_All.01)
SE <- svyglm(dv ~ treatment, design = design.CBPS)
Additionally, the survey SEs are substantially different from the default lm() way of estimating coefficient and SE provided by the CBPS package. For those more familiar with either the CPBS or survey packages, is here any reason why this would be inappropriate or violate some assumption of the CBPS method? I don't see anything the CBPS documentation about how to best estimate standard error so that's why I'm slightly concerned.
Sandwich (robust) standard errors are the most commonly use standard errors after propensity score weighting (including CBPS). For the ATE, they are known to be conservative (too large), and for the ATT, they can be either too large or too small. For parametric methods like CBPS, it is possible to use M-estimation to account for both the estimation of the propensity scores and the outcome model, but this is fairly complicated, especially for specialized models like CBPS.
The alternative is to use the bootstrap, where you bootstrap both the propensity score estimation and estimation of the treatment effect. The WeightIt documentation contains an example of how to do bootstrapping to estimate the confidence interval around a treatment effect estimate.
Using the survey package is one way to get robust standard errors, but there are other packages you can use, such as the sandwich package as recommended in the MatchIt documentation. Under no circumstance should you use or even consider the usual lm() standard errors; these are completely inaccurate for inverse probability weights. The AsyVar() function in CBPS seems like it should provide valid standard errors, but in my experience these are also wildly inaccurate (compared to a bootstrap); the function doesn't even get the treatment effect right.
I recommend you use a bootstrap. It may take some time (you ideally want around 1000 bootstrap replications), but these standard errors will be the most accurate.

Calculate coverage rate of an OLS estimator

I want to calculate the 95% coverage rate of an simple OLS estimator.
The (for me) tricky addition is that the independent variable has 91 values that i have to test against each other in order to see which value leads to the best estimate.
For each value of the independent variable i want to draw 1000 samples.
I tried looking up on the theory and also on multiple platforms such as stackoverflow, but i didn't manage to find an appropriate answer.
My biggest question is how to calculate a coverage rate for a 95% confidence interval.
I would deeply appreciate it, if you could provide me with some possibilities or insights.

How can one calculate ROC's AUCs in complex designs with clustering in R?

The packages that calculate AUCs I've found so far do not contemplate sample clustering, which increases standard errors compared to simple random sampling. I wonder if the ones provided by these packages could be recalculated to allow for clustering.
Thank you.
Your best bet is probably replicate weights, as long as you can get point estimates of AUC that incorporate weights.
If you convert your design into a replicate-weights design object (using survey::as.svrepdesign()), you can then run any R function or expression using the replicate weights using survey::withReplicates() and return a standard error.

Optimising weights in R (Portfolio)

I tried several packages in R and I am really lost in which one I should be using. I just need help in general direction and I can find my way myself for the exact code.
I am trying portfolio optimization in R. I need weights vector to be calculated where each weight in the vector represents percentage of that stock.
Given the weights, I calculate total return, variance and sharpe ratio (function of return and variance).
There could be constraints like total weights should be equal to 1 (100%) and may be some others on case by case basis.
I am trying to get my code to be flexible that I can optimize with different objectives (one at a time though). For example, I could want minimum variance in one simulation or maximum return in other and even max. sharpe ration in other.
This is pretty straight forward in excel with solver package. Once I have formulas entered, whichever cell I pick for objective function, it will calculate weights based on that and then calculate other parameters based on those weights. (Eg, if I optimize based on min variance, then it calculate weights for min variance and then calculate return and sharpe based on those weights).
I am wondering how to go about it in R? I am lost in reading documetation of several R packages or functions (Lpsolve, Optim, constrOptim, portfoiloAnalytics, etc) but not able to find the starting point. My specific questions are
Which would be the right R package for this kind of analysis?
Do I need to define separate functions for each possible objective, like variance, return and sharpe and optimize those functions? This is little tricky because sharpe depends on variance and returns. So if I want to optimize sharpe functions, then do I need to nest it within the variance and return functions?
I just need some ideas on how to start and I can give it a try. If I at least get the right package and right example to use, it would be great. I searched a lot on the web but I am really lost.

meta-analysis multiple outcome variables

As you might be able to tell from the sample of my dataset, it contains a lot of dependency, with each study providing multiple outcomes for the construct I am looking at. I was planning on using the metacor library because I only have information about the sample size but not variance. However, all methods I came across to that deal with dependency such as the package rubometa use variance (I know some people average the effect size for the study but I read that tends to produce larger error rates). Do you know if there is an equivalent package that uses only sample size or is it mathematically impossible to determine the weights without it?
Please note that I am a student, no expert.
You could use the escalc function of the metafor package to calculate variances for each effect size. In the case of correlations, it only needs the raw correlation coefficients and the corresponding sample sizes.
See section "Outcome Measures for Variable Association" # https://www.rdocumentation.org/packages/metafor/versions/2.1-0/topics/escalc

Resources