Running IRT with correlated items and many missing values - r

I'm hoping to run an IRT analysis where I have a couple families of items, and each person has answered one question from each family. Are there R packages for doing something like this? I've looked at options like hIRT but this seems to only account for individual level covariates, not item level covariates. I'm expecting the data to look something like this:
1a
1b
1c
2a
2b
2c
1
0
0
1
1
1
where each person answers one of 1a, 1b, and 1c, and one of 2a, 2b, and 2c.

Related

Cluster analysis in R on large data set

I have a data set with rankings as the column names and about 15,000 contestants. My data looks like:
contestant
1
2
3
4
101
13
0
5
12
14
0
1
34
6
...
...
...
...
...
500
0
2
23
3
I've been working on doing cluster analysis on this dataset. The dendrograms are obviously not very helpful with this dataset--it produces a thick block line because of the large number of entries.
I'm wondering if there is a better way to do cluster analysis with this type of data. I've tried
fviz_cluster()
and similar commands, as well as went through multiple tutorials. Many tutorials guided me through making dendograms. The data all seems to be different than mine (comparing two variables, etc) and much smaller. Essentially, I'm asking which types of cluster analysis may work well with this type of data.

Why is R adding empty factors to my data?

I have a simple data set in R -- 2 conditions called "COND", and within those conditions adults chose between one of 2 pictures, we call house or car. This variable is called "SAW"
I have 69 people, and 69 rows of data
FOR SOME Reason -- R is adding an empty factor to both, How do I get rid of it?
When I type table to see how many are in each-- this is the output
table(MazeData$SAW)
car house
2 9 59
table(MazeData$COND)
Apples No_Apples
2 35 33
Where the heck are these 2 mystery rows coming from? it wont let me make my simple box plots and bar plots or run t.test because of this error - can someone help? thanks!!

Using a column that contains a frequency/weight/count in R [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
This is an easy question to ask, but a hard one to search for. Frequency is used all over the place. I tried a synonym (weight), but since mtcars is so widely used, I get a lot of false negatives as well. Same thing for counts.
I'm looking at datasets::HairEyeColor, partly reproduced here:
Hair Eye Sex Freq
1 Black Brown Male 32
2 Brown Brown Male 53
3 Red Brown Male 10
4 Blond Brown Male 3
5 Black Blue Male 11
6 Brown Blue Male 50
7 Red Blue Male 10
8 Blond Blue Male 30
9 Black Hazel Male 10
10 Brown Hazel Male 25
.
.
.
I can across this when trying to show someone how to make a mosaic plot of any two of Hair, Eye, and Gender. On first read, I didn't see a way to specify a column to specify "this column represents 32 of the set members" but I didn't read too carefully.
I suppose I could reshape the data using melt() and reshape() every time I receive data with a frequency column, but that seems kind of drastic.
In other languages I know, I could add a parameter to the fitting function to let it know “there’s not just one row with this set of levels, there are n of them. So if I wanted to see a distribution, I might say
DISTR(Y=Hair, FREQ=freq)
...which would generate a histogram or density plot with n values per row
Alternately,
lm(hair ~ eye + sex, data = ‘HairEyeColor’, freq = ‘freq’)
Would fit a linear model with 32 replications if the first row rather than 1.
I’m asking about a way to use the 32 in the first row (for example) to tell the modeling or graphing function that there are 32 cases with this combination of levels, 53 with the combination in the second row, etc.
Surely this kind of data shows up a lot. I see it all the time, but there’s usually a way to say that this number specifies the frequency that this row represents in the actual data. Rather than a data table with 32 rows of Black, Brown, Male, there’s one row with frequency 32.
(No plyr please.)
No, there is not a standard way to use this type of data across all of R.
Many of the basic modeling functions, e.g., lm, glm, nls, loess, and more from the stats package accept a weights argument that will meet your needs. prop.test accepts data in either format. But many other modeling functions do not, e.g., knn, princomp, and many others not in base R.
barplot accepts input in either format. mosaicplot expects input as an aggregated contingency table. Other types of plots would require more custom handling, because there are a lot of different things you could do with frequency.
Of course, anything not in base R is up to whoever writes it.
ggplot2 (which is not base R) generally handles this really well, e.g., geom_bar will stack up values by default, or in the case of scatterplots you could map size or color or alpha to visually convey the intensity.
randomForest and xgboost do not accept weights
I will say that I very rarely find this to be a problem. I'd encourage you to ask specific questions about methods where it is causing you issues. I think mosaicplot is a bad example as it expects a contingency table, so the problem would be the opposite: using it with disaggregated data would require first aggregating it up to a frequency table.

Is it graph bipartition

Given M food and N person. Each person has their own fav food. Find minimum food to support all persons for their breakfast
example :
f1 f2 f3 \n
p1 1 1 0 \n
p2 0 0 1 \n
p3 0 1 0 \n
answer is 2since f2 supports 2 persons and f3 support 1 person
It's quite unclear what you are asking here as you are not explicitly posing any questions except for the one in the heading.
This is obviously a bipartite graph, yes, but the problem you are trying to solve is not a matching problem, but rather the set cover problem.
In order to see this, make a set for each food consisting of the persons that eats this food. In your example you will get the three sets {p1}, {p1, p3} and {p2} (who likes f1, f2 and f3, respectively). Finding the minimal number of these sets that together cover all of your persons, that is the set {p1, p2, p3}, is exactly an instance of the set cover problem.

How can i make Association rules considering count information?

Hi now i'm studying association rules with R.
i have a question.
in transcation data,
we consider just buy or non-buy (binary data)
i want to know how to perform association rules with count data
ex)
item1 item2 item3
1 2 0 1
2 0 1 0
3 1 0 0
first customer bought two item1s!!
but in ordinary association rules, that count information is ignored
how can we consider that information?
High, The quantitative association rules (QAR) mining may be helpful.
Firstly, you should divide the value field of every item to some sets and give every set an unique label. Then, the original dataset can be transformed to a binary dataset containing those labels.
for example, for item1, if the original data has the following information:
the first person have bought 5 item1s
the second one have bought 2 item1s
the third one have bought 7 item1s.
You can divide the value field of item1 to [0, 3), [3, 6) and [6, 9), and use a1, a2 and a3 to represent them, so the item 'item1' can be replaced by 3 other items, which are a1, a2 and a3, and the original data can be replaced by the follows.
the first person have bought one a2.
the second person have bought one a1.
the third person have bought one a3.
After doing this work on every item, the original dataset can be transformed to a binary dataset.

Resources