Causal relationship with deal (Bayes-network) - r

I am working with multiple binary vectors e.g., A,B,C,D,E,F,G,H.
I want to find the classification between them. I have tried the following:
log_data<-read.csv(choose.files(), as.is = T, header = T, blank.lines.skip = TRUE)
data<-log_data[2:ncol(log_data)]
data
TIME A B C D E F G
1 1 1 1 0 1 0 1 1
2 0 0 1 1 1 1 0 1
3 1 1 1 1 1 0 1 1
4 1 0 1 1 1 1 0 1
.....................
fit <- network(data)
fit.prior <- jointprior(fit)
fit <- getnetwork(learn(fit,rats,fit.prior))
**Error in postc0c(node$condposterior[[1]]$mu, node$condposterior[[1]]$tau, :
NA/NaN/Inf in foreign function call (arg 1)**
Getting this error just because all are continuous variable and NULL at mu.
How should I proceed in order to classify after creating a network?

Related

change name in variables R whit for

I'm trying to create variables to use, for each variable it's called,
M000
M001
M002
M003
Example
c.n_vars <- nrow(comb)
for (i in 1:c.n_vars)
{
paste("M",comb[i,1],comb[i,2],comb[i,3]) = Arima(y,order=c(arima[1,1],arima[2,1],arima[3,1]),seasonal=list(order=c(comb[i,1],comb[i,2],comb[i,3]),period=12))
}
where comb is all combinations
a <- c(0,1,2,3,4)
b <- c(0,1,2,3,4)
c <- c(0,1,2,3,4)
comb <- expand.grid(a, b,c)
row parameter1 parameter2 parameter3
1 0 0 0
2 1 0 0
3 2 0 0
4 3 0 0
5 4 0 0
6 0 1 0
7 1 1 0
8 2 1 0
9 3 1 0
10 4 1 0
11 0 2 0
12 1 2 0
13 2 2 0
14 3 2 0
15 4 2 0
and arima is
arima <- data.frame(c(2,1,4))
row parameters
1 2
2 1
3 4
i am trying to create
c.n_vars <- nrow(comb)
for (i in 1:c.n_vars)
{
paste("M",comb[i,1],comb[i,2],comb[i,3]) = Arima(y,order=c(arima[1,1],arima[2,1],arima[3,1]),seasonal=list(order=c(comb[i,1],comb[i,2],comb[i,3]),period=12))
}
this code must return
for i = 1
M000 = arima model saved in that variable
for i = 2
M100 = arima model saved in that variable
for i = 3
M200 = arima model saved in that variable
.
.
.
.
.
for i = 15
M420 = arima model saved in that variable
and the following error appears
Error in paste("M", comb[i, 1], comb[i, 2], comb[i, 3]) = Arima(y, order = c(arima[1, :
assignment target expands an object out of language
I need that each iteration of the variable 'i' be saved in a different variable
Is there any solution? or another way to do it
Your sample code is still incomplete. I was not able to run it. For example y is missing.
As Base_R_Best_R pointed out, you cannot use paste to create variables like that. You can use the following pattern instead. Also note that I replaced paste() with paste0() to avoid spaces in the names:
result = list()
for (i in 1:c.n_vars)
{
result[[paste0("M",comb[i,1],comb[i,2],comb[i,3])]] = Arima(y,order=c(arima[1,1],arima[2,1],arima[3,1]),seasonal=list(order=c(comb[i,1],comb[i,2],comb[i,3]),period=12))
}
Access your variables like this:
result$M100

Using LDAvis when doc_term_matrix has at least one row that all elements are zero

I'm using LDA for topic modeling:
dtm <- DocumentTermMatrix(docs)
However, there are rows that all elements in dtm are zero. So I followed the instruction in here
ui = unique(dtm$i)
dtm.new = dtm[ui,]
And, then LDA works and I have the topics and everything. My next attempt is to use LDAvis as recommended in here. Source code:
topicmodels_json_ldavis <- function(fitted, corpus, doc_term){
# Required packages
library(topicmodels)
library(dplyr)
library(stringi)
library(tm)
library(LDAvis)
# Find required quantities
phi <- posterior(fitted)$terms %>% as.matrix
theta <- posterior(fitted)$topics %>% as.matrix
vocab <- colnames(phi)
doc_length <- vector()
for (i in 1:length(corpus)) {
temp <- paste(corpus[[i]]$content, collapse = ' ')
doc_length <- c(doc_length, stri_count(temp, regex = '\\S+'))
}
temp_frequency <- inspect(doc_term)
freq_matrix <- data.frame(ST = colnames(temp_frequency),
Freq = colSums(temp_frequency))
rm(temp_frequency)
# Convert to json
json_lda <- LDAvis::createJSON(phi = phi, theta = theta,
vocab = vocab,
doc.length = doc_length,
term.frequency = freq_matrix$Freq)
return(json_lda)
}
When I call topicmodels_json_ldavis function, I receive this error:
Length of doc.length not equal to the number of rows in theta;
both should be equal to the number of documents in the data.
I checked the length of theta and doc.length. They are different. I assume because I pass the corpus (docs) which makes a dtm with (at least) a zero row. In order for the corpus to match with doc_term_matrix, I decided to make a new corpus from dtm.new as suggested in here. Source code:
dtm2list <- apply(dtm, 1, function(x) {
paste(rep(names(x), x), collapse=" ")
})
myCorp <- VCorpus(VectorSource(dtm2list))
I even made a new ldaOut with dtm.new and passed the following parameters to topicmodels_json_ldavis: ldaOut22, myCorp, dtm.new
I still receive the error message that theta and doc.length must have the same length.
I had the exact same problem, I was able to remove rows with all zero-vectors for LDA analysis, but then tumbled into row-count of the sparse matrix not matching anymore the row-count of Documents for LDAvis. I've solved it, unfortunately only for Python, but you may use the following approach as a starting point:
Lets see what I got first:
print(f'The tf matrix:\n {cvz.toarray()[:100]}\n')
sparseCountMatrix = np.array(cvz.toarray())
print(f'Number of non-zero vectors: {len(x[x>0])} Number of zero vectors: {len(x[x==0])}\n')
print(f'Have a look at the non-zero vectors:\n{x[x>0][:200]}\n')
print(f'This is our sparse matrix with {x.shape[0]} (# of documents) by {x.shape[1]} (# of terms in the corpus):\n{x.shape}')
Output:
The tf matrix:
[[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
...
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]]
Number of non-zero vectors: 4721 Number of zero vectors: 232354
Have a look at the non-zero vectors:
[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
This is our sparse matrix with 545 (# of documents) by 435 (# of terms in the corpus):
(545, 435)
How many rows contain all zero vectors?
len(list(np.array(sparseCountMatrix[(sparseCountMatrix==0).all(1)])))
Output: 12
How many rows contain at least one non-zero vector?
len(list(np.array(sparseCountMatrix[~(sparseCountMatrix==0).all(1)])))
Output: 533
Remove the 12 rows which contain all zero vectors for LDA Analysis:
cleanedSparseCountMatrix = np.array(sparseCountMatrix[~(sparseCountMatrix==0).all(1)])
Also remove these documents from original Pandas Series (tokens), so document count matches sparse matrix row count, which is important to visualize LDA results with pyLDAVis:
First, to get the index position of rows with all zero vectors, use np.where:
indexesToDrop = np.where((sparseCountMatrix==0).all(1))
print(f"Indexes with all zero vectors: {indexesToDrop}\n")
Output:
Indexes with all zero vectors: (array([ 47, 77, 88, 95, 106, 109, 127, 244, 363, 364, 367, 369],
dtype=int64),)
Second, use this list of indexes to drop original rows in Pandas series with series.drop:
data_tokens_cleaned = data['tokens'].drop(data['tokens'].index[indexesToDrop])
New length of cleaned tokens (should match sparse matrix length!):
len(data_tokens_cleaned)
Output:
533
This is our cleaned sparse matrix, ready for LDA analysis:
print(cleanedSparseCountMatrix.shape)
Output: (533, 435)

Why is the adjust option in R's logit function failing to adjust my data?

I need to find out why the adjust option in R's logit function (package "car") doesn't seem to be working.
I'm using R 3.2.2 on a project with a data file that has 450K records. The file contains five columns: four discrete variables (allowable value 0, 1, 2 or 3) and the response variable "sharer_prob" (allowable values 0 - 1--the data set contains are no values = 0, but many = 1).
Because it appeared that logit was gagging on the large file (I only have 8GB of RAM on this machine), I created a random sample without replacement("sampleset" contains 4125 records, which R should be able to handle, right?). The first 30 rows of that file look like this:
Index Sympathy Trust Fear Greed sharer_prob
817394 2 2 1 2 0.369541
167050 1 1 3 2 0.715611
822867 2 3 2 0 0.818810
323684 2 1 0 0 0.636559
515278 2 2 2 0 0.947242
133239 3 3 0 0 0.938594
459271 3 2 0 0 0.647838
29310 3 2 1 1 0.749249
534875 2 2 2 3 0.608888
656900 3 2 0 0 0.653784
418271 3 2 1 0 0.539343
616221 2 1 0 0 0.600135
592467 3 3 0 0 0.798354
905925 3 2 0 0 0.797031
612514 1 1 3 2 0.554193
8564 1 1 3 2 0.498373
319957 3 3 0 0 1.000000
867410 3 2 0 0 0.581564
551327 3 2 0 0 0.671110
864973 3 3 0 0 1.000000
450469 3 1 2 0 0.578244
877834 2 1 0 0 0.291015
410507 3 3 0 0 0.705201
198044 3 3 0 0 0.772060
548567 3 2 0 0 0.799164
435068 3 2 0 0 0.805771
518144 3 2 1 1 0.396816
609747 3 3 0 0 0.814917
398375 2 1 0 0 0.428143
854396 1 1 3 2 0.511154
Because I'm trying to identify significant interaction effects among pairs of the input variables, I used the following code to create interaction terms:
IX_greed <- sampleset$greed * sampleset$sharer_prob
IX_sympathy <- sampleset$sympathy * sampleset$sharer_prob
IX_fear <- sampleset$fear * sampleset$sharer_prob
IX_trust <- sampleset$trust * sampleset$sharer_prob
I used this line of code to try to regress one of the pairs on the logit of the response variable (sharer_prob):
lm.FG=lm( logit( sharer_prob, adjust=TRUE ) ~ IX_fear * IX_greed, na.action=NULL, data=sampleset )
But R gave the the following error, which indicates that it failed to adjust the values of 1 to 0.975:
**Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
NA/NaN/Inf in 'y'**
Can anyone tell me what I've done wrong and how to fix it?
Many thanks for any help you can provide.
Larry John

Splitting one Column to Multiple R and Giving logical value if true

I am trying to split one column in a data frame in to multiple columns which hold the values from the original column as new column names. Then if there was an occurrence for that respective column in the original give it a 1 in the new column or 0 if no match. I realize this is not the best way to explain so, for example:
df <- data.frame(subject = c(1:4), Location = c('A', 'A/B', 'B/C/D', 'A/B/C/D'))
# subject Location
# 1 1 A
# 2 2 A/B
# 3 3 B/C/D
# 4 4 A/B/C/D
and would like to expand it to wide format, something such as, with 1's and 0's (or T and F):
# subject A B C D
# 1 1 1 0 0 0
# 2 2 1 1 0 0
# 3 3 0 1 1 1
# 4 4 1 1 1 1
I have looked into tidyr and the separate function and reshape2 and the cast function but seem to getting hung up on giving logical values. Any help on the issue would be greatly appreciated. Thank you.
You may try cSplit_e from package splitstackshape:
library(splitstackshape)
cSplit_e(data = df, split.col = "Location", sep = "/",
type = "character", drop = TRUE, fill = 0)
# subject Location_A Location_B Location_C Location_D
# 1 1 1 0 0 0
# 2 2 1 1 0 0
# 3 3 0 1 1 1
# 4 4 1 1 1 1
You could take the following step-by-step approach.
## get the unique values after splitting
u <- unique(unlist(strsplit(as.character(df$Location), "/")))
## compare 'u' with 'Location'
m <- vapply(u, grepl, logical(length(u)), x = df$Location)
## coerce to integer representation
m[] <- as.integer(m)
## bind 'm' to 'subject'
cbind(df["subject"], m)
# subject A B C D
# 1 1 1 0 0 0
# 2 2 1 1 0 0
# 3 3 0 1 1 1
# 4 4 1 1 1 1

How to create a variable that indicates agreement from two dichotomous variables

I d like to create a new variable that contains 1 and 0. A 1 represents agreement between the rater (both raters 1 or both raters 0) and a zero represents disagreement.
rater_A <- c(1,0,1,1,1,0,0,1,0,0)
rater_B <- c(1,1,0,0,1,1,0,1,0,0)
df <- cbind(rater_A, rater_B)
The new variable would be like the following vector I created manually:
df$agreement <- c(1,0,0,0,1,0,1,1,1,1)
Maybe there's a package or a function I don't know. Any help would be great.
You could create df as a data.frame (instead of using cbind) and use within and ifelse:
rater_A <- c(1,0,1,1,1,0,0,1,0,0)
rater_B <- c(1,1,0,0,1,1,0,1,0,0)
df <- data.frame(rater_A, rater_B)
##
df <- within(df,
agreement <- ifelse(
rater_A==rater_B,1,0))
##
> df
rater_A rater_B agreement
1 1 1 1
2 0 1 0
3 1 0 0
4 1 0 0
5 1 1 1
6 0 1 0
7 0 0 1
8 1 1 1
9 0 0 1
10 0 0 1

Resources