Hmisc::describe not working with dataframe - r

I need a "describe" function that reports values with more than two decimal places and so I thought I would use the Hmisc describe function but even when using the sample code from http://www.inside-r.org/packages/cran/Hmisc/docs/describe I get an error:
> dfr <- data.frame(x=rnorm(400),y=sample(c('male','female'),400,TRUE))
> Hmisc::describe(dfr)
Error in UseMethod("describe") :
no applicable method for 'describe' applied to an object of class "data.frame"
> psych::describe(dfr)
vars n mean sd median trimmed mad min max range skew kurtosis se
x 1 400 0.07 0.96 0.07 0.07 0.94 -2.41 2.76 5.17 0.02 -0.3 0.05
y* 2 400 1.50 0.50 2.00 1.50 0.00 1.00 2.00 1.00 -0.01 -2.0 0.03
Any suggestions as to why it should be doing this?

You are trying to use describe in a way that is not supported. Just use:
require(Hmisc) # or library(Hmisc)
describe(mydataframe)
To get even better output install LaTeX and run
latex(describe(mydataframe), file='')
# file='' to put LaTeX code inline as for knitr

Related

Why does manova() not give me p values?

Whenever I use manova(), then summary.aov(), I only get df, Sum sq, and Mean Sq, with no p value.
My data frame looks like: (sorry I'm not sure if there's a better way to display this!)
subtype lymphocytosis anemia thrombocytopenia eosinophilia hypercalcemia hyperglobulinemia
1 MBC 0.60 0.18 0.17 0.02 0.01 0.04
2 SBC 0.25 0.18 0.14 0.03 0.02 0.12
3 BCLL 1.00 0.29 0.18 0.08 0.03 0.21
neutrophilia neutropenia lymphadenopathy_peripheral lymphadenopathy_visceral splenomegaly
1 0.23 0.02 1.00 0.65 0.60
2 0.22 0.04 0.99 0.62 0.49
3 0.23 0.04 0.40 0.25 0.49
hepatomegaly pleural_effusion peritoneal_effusion intestinal_mass mediastinal_mass pulmonary_mass
1 0.41 0.02 0.05 0.10 0.09 0.22
2 0.37 0.03 0.05 0.17 0.12 0.22
3 0.27 0.01 0.04 0.25 0.03 0.25
The values in the data frame represent the mean number of cases from each subtype for each clinical sign. I am a little worried that, for manova() to work, I should have each individual case and their clinical signs inputted so that manova can do its own math? Which would be a huge pain for me to assemble, hence why I've done it this way. Either way, I still think I should bet getting P values, they just might be wrong if my data frame is wrong?
The code I am using is:
cs_comp_try <- manova(cbind(lymphocytosis, anemia, thrombocytopenia, eosinophilia, hypercalcemia,
hyperglobulinemia, neutrophilia, neutropenia, lymphadenopathy_peripheral, lymphadenopathy_visceral,
splenomegaly, hepatomegaly, pleural_effusion, peritoneal_effusion, intestinal_mass, mediastinal_mass, pulmonary_mass) ~ subtype, data = cs_comp)
summary(cs_comp_try)
summary.aov(cs_comp_try)
The result I get for summary.aov() is:
Response peritoneal_effusion :
Df Sum Sq Mean Sq
subtype 2 6.6667e-05 3.3333e-05
Response intestinal_mass :
Df Sum Sq Mean Sq
subtype 2 0.011267 0.0056333
Response mediastinal_mass :
Df Sum Sq Mean Sq
subtype 2 0.0042 0.0021
Response pulmonary_mass :
Df Sum Sq Mean Sq
subtype 2 6e-04 3e-04
I think I've replicated all the examples I've seen on the internet, so I'm not sure why I'm not getting an F statistic and p value when I run this code.
You can just use the summary function to get the p-values like this (I use iris data as an example):
fit <- manova(cbind(Sepal.Length, Petal.Length) ~ Species, data = iris)
summary(fit)
#> Df Pillai approx F num Df den Df Pr(>F)
#> Species 2 0.9885 71.829 4 294 < 2.2e-16 ***
#> Residuals 147
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Created on 2022-07-15 by the reprex package (v2.0.1)
If you want to extract the actual p-values, you can use the following code:
fit <- manova(cbind(Sepal.Length, Petal.Length) ~ Species, data = iris)
summary(fit)$stats[1, "Pr(>F)"]
#> [1] 2.216888e-42
Created on 2022-07-15 by the reprex package (v2.0.1)

Creating an igraph from weighted correlation matrix csv

First of all, I'd like to say that I'm completely new to R, and I'm just trying to accomplish this one task.
So, what I'm trying to do is that I'd like to create an network diagram from a weighted matrix. I made an example:
The CSV is a simple correlation matrix that looks like this:
,A,B,C,D,E,F,G
A,1,0.9,0.64,0.43,0.38,0.33,0.33
B,0.9,1,0.64,0.33,0.43,0.38,0.38
C,0.64,0.64,1,0.59,0.69,0.64,0.64
D,0.43,0.33,0.59,1,0.28,0.23,0.28
E,0.38,0.43,0.69,0.28,1,0.95,0.9
F,0.33,0.38,0.64,0.23,0.95,1,0.9
G,0.33,0.38,0.64,0.28,0.9,0.9,1
I tried to draw the wanted result by myself and came up with this:
To be more precise, I draw the diagram first, then, using a ruler, I took note of the distances, calculated an equation to get the weights and made the CSV table.
The higher the value is, the closer the two points are to each other.
However, whatever I do, the best result I get is this:
And this is how I'm trying to accomplish it, using this tutorial:
First of all, I import my matrix:
> matrix <- read.csv(file = 'test_dataset.csv')
But after printing the matrix out with head(), this already somehow cuts the last line of the matrix:
> head(matrix)
ï.. A B C D E F G
1 A 1.00 0.90 0.64 0.43 0.38 0.33 0.33
2 B 0.90 1.00 0.64 0.33 0.43 0.38 0.38
3 C 0.64 0.64 1.00 0.59 0.69 0.64 0.64
4 D 0.43 0.33 0.59 1.00 0.28 0.23 0.28
5 E 0.38 0.43 0.69 0.28 1.00 0.95 0.90
6 F 0.33 0.38 0.64 0.23 0.95 1.00 0.90
> dim(matrix)
[1] 7 8
I then proceed with removing the first column so the matrix is square again...
> matrix <- data.matrix(matrix)[,-1]
> head(matrix)
A B C D E F G
[1,] 1.00 0.90 0.64 0.43 0.38 0.33 0.33
[2,] 0.90 1.00 0.64 0.33 0.43 0.38 0.38
[3,] 0.64 0.64 1.00 0.59 0.69 0.64 0.64
[4,] 0.43 0.33 0.59 1.00 0.28 0.23 0.28
[5,] 0.38 0.43 0.69 0.28 1.00 0.95 0.90
[6,] 0.33 0.38 0.64 0.23 0.95 1.00 0.90
> dim(matrix)
[1] 7 7
Then I create the graph and try to plot it:
> network <- graph_from_adjacency_matrix(matrix, weighted=T, mode="undirected", diag=F)
> plot(network)
And the result above appears...
So, after spending the last few hours googling and trying way, way more things, this is the closest I've been able to get to.
So I'm asking for your help, thank you very much!
This is all fine.
head() just prints out the first 6 rows of a matrix or dataframe, if you want to see all of it use print() or just the name of the matrix variable.
graph_from_adjacency_matrix produces a link between two nodes if the value is non-zero. That's why you are getting every node linked to every other node.
To get what that tutorial is doing you need to add a line like
matrix[matrix<0.5] <- 0
to remove the edges for correlations below a cut off before you create the graph.
It's still not going to produce a chart like your hand drawn one (where closeness is roughly the correlation), just clump them together if they are above 0.5 correlation.

R Function to get Confidence Interval of Difference Between Means

I am trying find a function that allows me two easily get the confidence interval of difference between two means.
I am pretty sure t.test has this functionality, but I haven't been able to make it work. Below is a screenshot of what I have tried so far:
Image
This is the dataset I am using
Indoor Outdoor
1 0.07 0.29
2 0.08 0.68
3 0.09 0.47
4 0.12 0.54
5 0.12 0.97
6 0.12 0.35
7 0.13 0.49
8 0.14 0.84
9 0.15 0.86
10 0.15 0.28
11 0.17 0.32
12 0.17 0.32
13 0.18 1.55
14 0.18 0.66
15 0.18 0.29
16 0.18 0.21
17 0.19 1.02
18 0.20 1.59
19 0.22 0.90
20 0.22 0.52
21 0.23 0.12
22 0.23 0.54
23 0.25 0.88
24 0.26 0.49
25 0.28 1.24
26 0.28 0.48
27 0.29 0.27
28 0.34 0.37
29 0.39 1.26
30 0.40 0.70
31 0.45 0.76
32 0.54 0.99
33 0.62 0.36
and I have been trying to use t.test function that has been installed from
install.packages("ggpubr")
I am pretty new to R, so sorry if there is a simple answer to this question. I have searched around quite a bit and haven't been able to find anything that I am looking for.
Note: The output I am looking for is Between -1.224 and 0.376
Edit:
The CI of difference between means I am looking for is if a random 34th datapoint was added to the chart by picking a random value in the Indoor column and a random value in the Outdoor column and duplicating it. Running the t.test will output the correct CI for the difference of means for the given sample size of 33.
How can I go about doing this pretending the sample size is 34?
there's probably something more convenient in the standard library, but it's pretty easy to calculate. given your df variable, we can just do:
# calculate mean of difference
d_mu <- mean(df$Indoor) - mean(df$Outdoor)
# calculate SD of difference
d_sd <- sqrt(var(df$Indoor) + var(df$Outdoor))
# calculate 95% CI of this
d_mu + d_sd * qt(c(0.025, 0.975), nrow(df)*2)
giving me: -1.2246 0.3767
mostly for #AkselA: I often find it helpful to check my work by sampling simpler distributions, in this case I'd do something like:
a <- mean(df$Indoor) + sd(df$Indoor) * rt(1000000, nrow(df)-1)
b <- mean(df$Outdoor) + sd(df$Outdoor) * rt(1000000, nrow(df)-1)
quantile(a - b, c(0.025, 0.975))
which gives me answers much closer to the CI I gave in the comment
Even though I always find the approach of manually calculating the results, as shown by #Sam Mason, the most insightful, there are some who want a shortcut. And sometimes, it's also ok to be lazy :)
So among the different ways to calculate CIs, this is imho the most comfortable:
DescTools::MeanDiffCI(Indoor, Outdoor)
Here's a reprex:
IV <- diamonds$price
DV <- rnorm(length(IV), mean = mean(IV), sd = sd(IV))
DescTools::MeanDiffCI(IV, DV)
gives
meandiff lwr.ci upr.ci
-18.94825 -66.51845 28.62195
This is calculated with 999 bootstrapped samples by default. If you want 1000 or more, you can just add that in the argument R:
DescTools::MeanDiffCI(IV, DV, R = 1000)

Save unequal output to a csv or txt file

I want to save the following output I get in the R console into a csv or txt file.
Discordancy measures (critical value 3.00)
0.17 3.40 1.38 0.90 1.62 0.13 0.15 1.69 0.34 0.39 0.36 0.68 0.39
0.54 0.70 0.70 0.79 2.08 1.14 1.23 0.60 2.00 1.81 0.77 0.35 0.15
1.55 0.78 2.87 0.34
Heterogeneity measures (based on 100 simulations)
30.86 14.23 3.75
Goodness-of-fit measures (based on 100 simulations)
glo gev gno pe3 gpa
-3.72 -12.81 -19.80 -32.06 -37.66
This is the outcome I get when I run the following
Heter<-regtst(regsamlmu(-extremes), nsim=100)
where Heter is a list (i.e., is.list(Heter) returns TRUE)
You could use capture.output:
capture.output(regtst(regsamlmu(-extremes), nsim=100), file="myoutput.txt")
Or for capturing output coming from several consequential commands:
sink("myfile.txt")
#
# [commands generating desired output]
#
sink()
You could make a character vector which you write to a file. Each entry in the vector will be separated by a newline character.
out <- capture.output(regtst(regsamlmu(-extremes), nsim=100))
write(out, "output.txt", sep="\n")
If you would like to add more lines just do something like c(out, "hello Kostas")

Descriptive Statistics of "timeSeries" structure data using psych package in R

Sorry for this stupid question, I am new to R. I have some in such formate and saved in CSV:
%Y-%m-%d,st1,st2,st3,st4,st5,st6,st7,st8,st9,st10
2005-09-20,38.75,48.625,48.5,23.667,45.5,48.75,18.75,33.25,43.455,76.042
2005-09-21,39.482,49.3,49,23.9,46.15,50.281,18.975,34.125,44.465,78.232
...
I import it in R
library(fPortfolio)
Data <- readSeries(file = "data.csv", header = TRUE, sep = ",")
I want to have some descriptive statistics
library(psych)
describe(Data)
Error in x[!is.na(x[, i]), i] :
invalid or not-yet-implemented 'timeSeries' subsetting
Any suggestion?
you probably want to make it a time series first right?
tS <- dummySeries() #make quick dummy time series
describe(tS) # fails
but
newtS<-as.ts(tS)
describe(newtS) #works fine giving:
var n mean sd median trimmed mad min max range skew kurtosis se
Series 1 1 12 0.49 0.25 0.44 0.48 0.29 0.13 0.89 0.76 0.24 -1.52 0.07
Series 2 2 12 0.45 0.28 0.44 0.45 0.42 0.07 0.83 0.77 0.03 -1.74 0.08

Resources