Optimum way to perform a bland altman analysis using R - r

Is there a way to produce a Bland-Altman plot using GGplot2?
I have looked at using methcomp but cant seem to get my data into a Meth object
library(MethComp)
comp <- read.csv("HIVVL.csv")
com <- data.frame(comp)
co <- Meth(com)
with(co, BA.plot(Qiagen, Abbot))
keep running into the error
comp <- read.csv("HIVVL.csv")
com <- data.frame(comp)
co <- Meth(com)
Error in `[.data.frame`(data, , meth) : undefined columns selected
a print of com looks somthing like this
Abbot Qiagen
1 66000 66057
2 40273 73376
3 13818 14684
4 53328 195509
5 8369 25000
6 89833 290000
7 116 219

Have you read ?Meth? It is looking for columns named meth and item in your data, which don't exist (see my example below).
Also, the step com <- data.frame(comp) is not doing anything different than com <- comp. read.csv already returns a data.frame.
d <- data.frame(x=1:10, y=1:10)
Meth(d)
# Error in `[.data.frame`(data, , meth) : undefined columns selected
Meth(d, meth='x')
# Error in `[.data.frame`(data, , item) : undefined columns selected
Meth(d, meth='x', item='y')
# The following variables from the dataframe
# "d" are used as the Meth variables:
# meth: x
# item: y
# y: y
# #Replicates
# Method 1 #Items #Obs: 10 Values: min med max
# 1 1 1 1 1 1 1
# 2 1 1 1 2 2 2
# 3 1 1 1 3 3 3
# 4 1 1 1 4 4 4
# 5 1 1 1 5 5 5
# 6 1 1 1 6 6 6
# 7 1 1 1 7 7 7
# 8 1 1 1 8 8 8
# 9 1 1 1 9 9 9
# 10 1 1 1 10 10 10

Related

Recoding specific column values using reference list

My dataframe looks like this
data = data.frame(ID=c(1,2,3,4,5,6,7,8,9,10),
Gender=c('Male','Female','Female','Female','Male','Female','Male','Male','Female','Female'))
And I have a reference list that looks like this -
ref=list(Male=1,Female=2)
I'd like to replace values in the Gender column using this reference list, without adding a new column to my dataframe.
Here's my attempt
do.call(dplyr::recode, c(list(data), ref))
Which gives me the following error -
no applicable method for 'recode' applied to an object of class
"data.frame"
Any inputs would be greatly appreciated
An option would be do a left_join after stacking the 'ref' list to a two column data.frame
library(dplyr)
left_join(data, stack(ref), by = c('Gender' = 'ind')) %>%
select(ID, Gender = values)
A base R approach would be
unname(unlist(ref)[as.character(data$Gender)])
#[1] 1 2 2 2 1 2 1 1 2 2
In base R:
data$Gender = sapply(data$Gender, function(x) ref[[x]])
You can use factor, i.e.
factor(data$Gender, levels = names(ref), labels = ref)
#[1] 1 2 2 2 1 2 1 1 2 2
You can unlist ref to give you a named vector of codes, and then index this with your data:
transform(data,Gender=unlist(ref)[as.character(Gender)])
ID Gender
1 1 1
2 2 2
3 3 2
4 4 2
5 5 1
6 6 2
7 7 1
8 8 1
9 9 2
10 10 2
Surprisingly, that one works as well:
data$Gender <- ref[as.character(data$Gender)]
#> data
# ID Gender
# 1 1 1
# 2 2 2
# 3 3 2
# 4 4 2
# 5 5 1
# 6 6 2
# 7 7 1
# 8 8 1
# 9 9 2
# 10 10 2

How to use `grm` in `ltm` package?

I'm trying to run grm in ltm package. My script is as follows:
library (ltm)
library (msm)
library (polycor)
dim(data)
head(data)
str(data)
descript(data)
options(max.print=1000000)
rcor.test(data, method = "pearson")
data_2 <- data
data_2[] <- lapply(data_2, factor)
out <- grm(data_2)
out2 <- grm(data_2, constrained = TRUE)
anova(out2,out)
margins(out)
However, when I run margins(out) I get this error: Error in exp[ind] <- n * colSums(GHw * pp) : subscript out of bounds
Would someone please explain this? And how can I resolve this?
I have 35 items in my questionnaire and 576 responders. Here is is an example of the data (first 6 responders and first 6 items).
pespd_qa1 pespd_qa2 pespd_qa3 pespd_qa4 pespd_qa5 pespd_qa6
1 9 5 7 4 1 3
2 5 0 9 6 0 8
3 5 3 5 6 3 5
4 7 5 4 3 1 1
5 2 3 0 0 0 0
6 10 1 8 2 2 5

Attempting to remove a row in R using variable names

I am trying to remove some rows in a for loop in R. The conditional involves comparing it to the line below it, so I can't filter within the brackets.
I know that I can remove a row when a constant is specified: dataframe[-2, ]. I just want to do the same with a variable: dataframe[-x, ]. Here's the full loop:
for (j in 1:(nrow(referrals) - 1)) {
k <- j + 1
if (referrals[j, "Client ID"] == referrals[k, "Client ID"] &
referrals[j, "Provider SubCode"] == referrals[k, "Provider SubCode"]) {
referrals[-k, ]
}
}
The code runs without complaint, but no rows are removed (and I know some should be). Of course, if it I test it with a constant, it works fine: referrals[-2, ].
You need to add a reproducible example for people to work with. I don't know the structure of your data, so I can only guess if this will work for you. I would not use a loop, for the reasons pointed out in the comments. I would identify the rows to remove first, and then remove them using normal means. Consider:
set.seed(4499) # this makes the example exactly reproducible
d <- data.frame(Client.ID = sample.int(4, 20, replace=T),
Provider.SubCode = sample.int(4, 20, replace=T))
d
# Client.ID Provider.SubCode
# 1 1 1
# 2 1 4
# 3 3 2
# 4 4 4
# 5 4 1
# 6 2 2
# 7 2 2 # redundant
# 8 3 1
# 9 4 4
# 10 3 4
# 11 1 3
# 12 1 3 # redundant
# 13 3 4
# 14 1 2
# 15 3 2
# 16 4 4
# 17 3 4
# 18 2 2
# 19 4 1
# 20 3 3
redundant.rows <- with(d, Client.ID[1:nrow(d)-1]==Client.ID[2:nrow(d)] &
Provider.SubCode[1:nrow(d)-1]==Provider.SubCode[2:nrow(d)] )
d[-c(which(redundant.rows)+1),]
# Client.ID Provider.SubCode
# 1 1 1
# 2 1 4
# 3 3 2
# 4 4 4
# 5 4 1
# 6 2 2
# 8 3 1 # 7 is missing
# 9 4 4
# 10 3 4
# 11 1 3
# 13 3 4 # 12 is missing
# 14 1 2
# 15 3 2
# 16 4 4
# 17 3 4
# 18 2 2
# 19 4 1
# 20 3 3
Using all information given by you, I believe this could be a good alternative:
duplicated.rows <- duplicated(referrals)
Then, if you want the duplicated results run:
referrals.double <- referrals[duplicated.rows, ]
However, if you want the non duplicated results run:
referrals.not.double <- referrals[!duplicated.rows, ]
If you prefer to go step by step (maybe it's interesting for you):
duplicated.rows.Client.ID <- duplicated(referrals$"Client ID")
duplicated.rows.Provider.SubCode <- duplicated(referrals$"Provider SubCode")
referrals.not.double <- referrals[!duplicated.rows.Client.ID, ]
referrals.not.double <- referrals.not.double[!duplicated.rows.Client.ID, ]

Repeat vector to fill down column in data frame

Seems like this very simple maneuver used to work for me, and now it simply doesn't. A dummy version of the problem:
df <- data.frame(x = 1:5) # create simple dataframe
df
x
1 1
2 2
3 3
4 4
5 5
df$y <- c(1:5) # adding a new column with a vector of the exact same length. Works out like it should
df
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
df$z <- c(1:4) # trying to add a new colum, this time with a vector with less elements than there are rows in the dataframe.
Error in `$<-.data.frame`(`*tmp*`, "z", value = 1:4) :
replacement has 4 rows, data has 5
I was expecting this to work with the following result:
x y z
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 1
I.e. the shorter vector should just start repeating itself automatically. I'm pretty certain this used to work for me (it's in a script that I've been running a hundred times before without problems). Now I can't even get the above dummy example to work like I want to. What am I missing?
If the vector can be evenly recycled, into the data.frame, you do not get and error or a warning:
df <- data.frame(x = 1:10)
df$z <- 1:5
This may be what you were experiencing before.
You can get your vector to fit as you mention with rep_len:
df$y <- rep_len(1:3, length.out=10)
This results in
df
x z y
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 1
5 5 5 2
6 6 1 3
7 7 2 1
8 8 3 2
9 9 4 3
10 10 5 1
Note that in place of rep_len, you could use the more common rep function:
df$y <- rep(1:3,len=10)
From the help file for rep:
rep.int and rep_len are faster simplified versions for two common cases. They are not generic.
If the total number of rows is a multiple of the length of your new vector, it works fine. When it is not, it does not work everywhere. In particular, probably you have used this type of recycling with matrices:
data.frame(1:6, 1:3, 1:4) # not a multiply
# Error in data.frame(1:6, 1:3, 1:4) :
# arguments imply differing number of rows: 6, 3, 4
data.frame(1:6, 1:3) # a multiple
# X1.6 X1.3
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 1
# 5 5 2
# 6 6 3
cbind(1:6, 1:3, 1:4) # works even with not a multiple
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] 2 2 2
# [3,] 3 3 3
# [4,] 4 1 4
# [5,] 5 2 1
# [6,] 6 3 2
# Warning message:
# In cbind(1:6, 1:3, 1:4) :
# number of rows of result is not a multiple of vector length (arg 3)

Performing calculations on binned counts in R

I have a dataset stored in a text file in the format of bins of values followed by counts, like this:
var_a 1:5 5:12 7:9 9:14 ...
indicating that var_a took on the value 1 5 times in the dataset, 5 12 times, etc. Each variable is on its own line in that format.
I'd like to be able to perform calculations on this dataset in R, like quantiles, variance, and so on. Is there an easy way to load the data from the file and calculate these statistics? Ultimately I'd like to make a box-and-whisker plot for each variable.
Cheers!
You could use readLines to read in the data file
.x <- readLines(datafile)
I will create some dummy data, as I don't have the file. This should be the equivalent of the output of readLines
## dummy
.x <- c("var_a 1:5 5:12 7:9 9:14", 'var_b 1:5 2:12 3:9 4:14')
I split by spacing to get each
#split by space
space_split <- strsplit(.x, ' ')
# get the variable names (first in each list)
variable_names <- lapply(space_split,'[[',1)
# get the variable contents (everything but the first element in each list)
variable_contents <- lapply(space_split,'[',-1)
# a function to do the appropriate replicates
do_rep <- function(x){rep.int(x[1],x[2])}
# recreate the variables
variables <- lapply(variable_contents, function(x){
.list <- strsplit(x, ':')
unlist(lapply(lapply(.list, as.numeric), do_rep))
})
names(variables) <- variable_names
you could get the variance for each variable using
lapply(variables, var)
## $var_a
## [1] 6.848718
##
## $var_b
## [1] 1.138462
or get boxplots
boxplot(variables, ~.)
Not knowing the actual form that your data is in, I would probably use something like readLines to get each line in as a vector, then do something like the following:
# Some sample data
temp = c("var_a 1:5 5:12 7:9 9:14",
"var_b 1:7 4:9 3:11 2:10",
"var_c 2:5 5:14 6:6 3:14")
# Extract the names
NAMES = gsub("[0-9: ]", "", temp)
# Extract the data
temp_1 = strsplit(temp, " |:")
temp_1 = lapply(temp_1, function(x) as.numeric(x[-1]))
# "Expand" the data
temp_1 = lapply(1:length(temp_1),
function(x) rep(temp_1[[x]][seq(1, length(temp_1[[x]]), by=2)],
temp_1[[x]][seq(2, length(temp_1[[x]]), by=2)]))
names(temp_1) = NAMES
temp_1
# $var_a
# [1] 1 1 1 1 1 5 5 5 5 5 5 5 5 5 5 5 5 7 7 7 7 7 7 7 7 7 9 9 9 9 9 9 9 9 9 9 9 9 9 9
#
# $var_b
# [1] 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2
#
# $var_c
# [1] 2 2 2 2 2 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 3 3 3 3 3 3 3 3 3 3 3 3 3 3

Resources