group_by prevents correct formatting with pandoc - r

also posted as an issue on github
After using group_by, cannot output table with pandoc correctly with the digits= or round= parameters.
Take the group_by out of the chain and pandoc displays the table just fine. Add the group_by in and the number of decimal places of the floating point numbers is way to big to display.
# test dataframe
dat <- data.frame(matrix(rnorm(10 * 10), 10))
group <- rbinom(10,20,.1)
df1 <- cbind(group, dat)
library(pander)
pander(df1, digits = 2, keep.line.breaks = TRUE, split.table = Inf,
caption = "Not Grouped, correct format")
library(dplyr)
df2 <- df1 %>%
group_by(group)
pander(df2, digits = 2, keep.line.breaks = TRUE, split.table = Inf,
caption = "Grouped, incorrect format")
Is there a way around this?

As a workaround, you can convert the object df2 (of class tbl_df) to a data.frame object.
pander(as.data.frame(df2), digits = 2, keep.line.breaks = TRUE, split.table = Inf)
The result:
-----------------------------------------------------------------------
group X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
------- ------ ----- ----- ----- ------ ----- ----- ------ ------ -----
0 -0.55 -0.13 -0.71 -1.3 -0.096 0.49 0.73 -0.53 0.17 -0.44
2 -1.5 1.4 -2.1 0.96 -0.2 -0.36 0.33 0.2 0.67 -0.27
1 -2.3 -0.98 -1.5 1.1 0.87 -0.54 1.2 -0.24 0.31 -0.76
1 0.24 0.086 -0.78 0.39 -0.17 -0.2 -1.5 -1.1 -1.3 -0.72
0 0.2 -1.2 0.27 2.1 0.73 1.8 -0.12 -0.45 0.07 -0.29
1 0.022 0.084 -0.41 0.32 -0.023 0.38 0.57 -0.16 0.0011 -0.76
2 0.99 0.7 -0.32 -0.25 -0.17 -0.68 -0.59 0.29 0.77 -0.12
3 -1.3 -1.6 -0.14 0.49 0.61 1.2 0.14 -0.087 -1.2 -0.95
0 -0.073 -0.86 2 -0.87 0.51 -1.3 -0.94 0.022 0.6 0.68
3 1.8 -0.81 -0.4 0.72 2.1 0.19 0.086 1.7 0.19 -0.49
-----------------------------------------------------------------------

Related

Show different data in top and bottom of Rcirclize

I have 2 dataframes with different number of rows and columns, and I'd like to show both of them in a circos plot with circlize.
My data looks like this:
df1=data.frame(replicate(7,sample(-200:200,200,rep=TRUE))/100)
df2=data.frame(replicate(2,sample(-200:200,200,rep=TRUE))/100)
#head(df1)
X1 X2 X3 X4 X5 X6 X7
1 -0.03 0.63 -0.33 0.73 -1.37 -1.39 1.96
2 -1.81 -1.24 -1.63 1.58 0.13 1.39 -0.76
3 0.02 -2.00 -1.93 -1.35 1.06 -0.58 -0.77
4 -1.11 -1.38 -0.66 -0.40 1.69 -0.47 -1.55
5 0.98 0.06 0.00 -0.35 1.97 1.74 0.72
6 1.51 -1.68 -0.44 -1.74 0.15 0.26 0.36
#head(df2)
X1 X2
1 0.16 -0.81
2 -1.38 -0.16
3 -0.22 -0.74
4 0.73 -0.82
5 0.58 -1.87
6 -0.63 1.50
I want to build a single circos plot where the top is showing df1 and bottom is showing df2, but I can only show individual dfs. For instance, this is how I show df1:
col_fun1=colorRamp2(c(min(df1), 0, max(df1)), c("blue", "white", "red"))
circos.heatmap(df1, col = col_fun1, cluster = T, track.height = 0.2, rownames.side = "outside", rownames.cex = 0.6)
circos.clear()
How can I df1 only in the top half, and df2 only in the bottom half?

How to retrieve observation scores for each Principal Component in R using principal Function

pc_unrotate = principal(correlate1,nfactors = 4,rotate = "none")
print(pc_unrotate)
output:
Principal Components Analysis
Call: principal(r = correlate1, nfactors = 4, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
PC1 PC2 PC3 PC4 h2 u2 com
ProdQual 0.25 -0.50 -0.08 0.67 0.77 0.232 2.2
Ecom 0.31 0.71 0.31 0.28 0.78 0.223 2.1
TechSup 0.29 -0.37 0.79 -0.20 0.89 0.107 1.9
CompRes 0.87 0.03 -0.27 -0.22 0.88 0.119 1.3
Advertising 0.34 0.58 0.11 0.33 0.58 0.424 2.4
ProdLine 0.72 -0.45 -0.15 0.21 0.79 0.213 2.0
SalesFImage 0.38 0.75 0.31 0.23 0.86 0.141 2.1
ComPricing -0.28 0.66 -0.07 -0.35 0.64 0.359 1.9
WartyClaim 0.39 -0.31 0.78 -0.19 0.89 0.108 2.0
OrdBilling 0.81 0.04 -0.22 -0.25 0.77 0.234 1.3
DelSpeed 0.88 0.12 -0.30 -0.21 0.91 0.086 1.4
PC1 PC2 PC3 PC4
SS loadings 3.43 2.55 1.69 1.09
Proportion Var 0.31 0.23 0.15 0.10
Cumulative Var 0.31 0.54 0.70 0.80
Proportion Explained 0.39 0.29 0.19 0.12
Cumulative Proportion 0.39 0.68 0.88 1.00
Mean item complexity = 1.9
Test of the hypothesis that 4 components are sufficient.
The root mean square of the residuals (RMSR) is 0.06
Fit based upon off diagonal values = 0.97
Now i need to get the scores, Tried pc_unrotate$scores but it returns null.
executed names(pc_unrotate),
Name of PCA
and found that Scores attribute is missing...so what can i do to get PCA scores?
Add argument scores=TRUE to the principal() function call: https://www.rdocumentation.org/packages/psych/versions/1.9.12.31/topics/principal
pc_unrotate = principal(correlate1,nfactors = 4,rotate = "none", scores = TRUE)

loop a function in r and output the result to .csv file

Subject var1 var2 var3 var4 var5
1 0.2 0.78 7.21 0.5 0.47
1 0.52 1.8 11.77 -0.27 -0.22
1 0.22 0.84 7.32 0.35 0.36
2 0.38 1.38 10.05 -0.25 -0.2
2 0.56 1.99 13.76 -0.44 -0.38
3 0.35 1.19 7.23 -0.16 -0.06
4 0.09 0.36 4.01 0.55 0.51
4 0.29 1.08 9.48 -0.57 -0.54
4 0.27 1.03 9.42 -0.19 -0.21
4 0.25 0.9 7.06 0.12 0.12
5 0.18 0.65 5.22 0.41 0.42
5 0.15 0.57 5.72 0.01 0.01
6 0.26 0.94 7.38 -0.17 -0.13
6 0.14 0.54 5.13 0.16 0.17
6 0.22 0.84 6.97 -0.66 -0.58
6 0.18 0.66 5.79 0.23 0.25
# the above is sample data matrix (dat11)
# The following lines of function is to calculate the p-value (P.z) for a
# variable pair var2 and var3 using lmer().
fit1 <- lmer(var2 ~ var3 + (1|Subject), data = dat11)
summary(fit1)
coefs <- data.frame(coef(summary(fit1)))
# use normal distribution to approximate p-value
coefs$p.z <- 2 * (1 - pnorm(abs(coefs$t.value)))
round(coefs,6)
# the following is the result
Estimate Std.Error t.value p.z
(Intercept) -0.280424 0.110277 -2.542913 0.010993
var3 0.163764 0.013189 12.417034 0.000000
The real data contains 65 variables (var1, var2....var65). I would like to use the above codes to find the above result for all possible pairs of 65 variables, eg, var1 ~ var2, var1 ~var3, ... var1 ~var65; var2 ~var3, var2 ~ var4, ... var2~var65; var3~var4, ... and so on. There will be about 2000 pairs. Can somebody help me with the loop codes and get the results to a .csv file? Thank you.

Returning function output

I am trying to run this function and trying to obtain output. Inputs are in vector form and response of the function in fact include a vector and not a scalar. I tried this code but its not working. I am new to R and your assistance will be highly appreciated. Following is the code
park91a <- function(xx) {
x1 <- xx[1]
x2 <- xx[2]
x3 <- xx[3]
x4 <- xx[4]
term1a <- x1/2
term1b <- sqrt(abs(1 + (x2+x3^2)*x4/(x1^2))) - 1
term1 <- term1a * term1b
term2a <- x1 + 3*x4
term2b <- exp(1 + sin(x3))
term2 <- term2a * term2b
real <- term1 + term2
return(real)
}
data <- read.csv("TRANSISTORDATA.txt")
mydata <- na.omit(data)
mydata
inputs <- mydata[1:10,1:4]
y <- mydata[,7]
y <- y[1:10]
xx<- inputs
xx
xx <- c(inputs)
park91a(xx)
inputs include data like this
x1 x2 x3 x4
1 0.21 -0.26 0.23 -0.21
2 -0.19 0.18 0.22 0.21
3 -0.19 -0.08 -0.28 -0.28
4 0.19 -0.25 0.28 0.28
5 -0.28 0.25 -0.22 -0.21
6 -0.22 0.21 0.17 0.16
7 -0.22 -0.12 0.27 -0.25
8 0.11 0.23 -0.27 0.24
9 -0.19 -0.19 -0.19 0.24
10 0.17 0.21 0.19 -0.24
If the input is a matrix, then xx[1] will be 0.21, xx[2] will be -0.19, and it will carry on down the first column in that fashion. If you wish to calculate for the whole columns (I assume this is the intention given the column names and names in the function. then you will need to edit the first lines of the function, such as:
x1 <- xx[,1]
x2 <- xx[,2]
x3 <- xx[,3]
x4 <- xx[,4]
You did not gave your desired output. So your question is not totally clear to me. First of all I reconstructed your data:
inputs <- read.table(header=TRUE, text=
" x1 x2 x3 x4
0.21 -0.26 0.23 -0.21
-0.19 0.18 0.22 0.21
-0.19 -0.08 -0.28 -0.28
0.19 -0.25 0.28 0.28
-0.28 0.25 -0.22 -0.21
-0.22 0.21 0.17 0.16
-0.22 -0.12 0.27 -0.25
0.11 0.23 -0.27 0.24
-0.19 -0.19 -0.19 0.24
0.17 0.21 0.19 -0.24")
Eventually you want something like this:
within(inputs, {
term1a <- x1/2
term1b <- sqrt(abs(1 + (x2+x3^2)*x4/(x1^2))) - 1
term1 <- term1a * term1b
term2a <- x1 + 3*x4
term2b <- exp(1 + sin(x3))
term2 <- term2a * term2b
real <- term1 + term2
})
result:
# x1 x2 x3 x4 real term2 term2b term2a term1 term1b term1a
# 1 0.21 -0.26 0.23 -0.21 -1.3910343 -1.4340132 3.414317 -0.42 0.0429788836 0.409322701 0.105
# 2 -0.19 0.18 0.22 0.21 1.4377575 1.4877264 3.381196 0.44 -0.0499689622 0.525989076 -0.095
# 3 -0.19 -0.08 -0.28 -0.28 -2.1243796 -2.1237920 2.061934 -1.03 -0.0005876561 0.006185854 -0.095
# 4 0.19 -0.25 0.28 0.28 3.6507163 3.6910628 3.583556 1.03 -0.0403465463 -0.424700487 0.095
# 5 -0.28 0.25 -0.22 -0.21 -1.9113789 -1.9886573 2.185338 -0.91 0.0772783929 -0.551988521 -0.140
# 6 -0.22 0.21 0.17 0.16 0.7998736 0.8370334 3.219359 0.26 -0.0371597771 0.337816156 -0.110
# 7 -0.22 -0.12 0.27 -0.25 -3.4554087 -3.4427557 3.549233 -0.97 -0.0126529657 0.115026961 -0.110
# 8 0.11 0.23 -0.27 0.24 1.8185544 1.7279556 2.081874 0.83 0.0905987637 1.647250250 0.055
# 9 -0.19 -0.19 -0.19 0.24 1.2732947 1.1927515 2.250475 0.53 0.0805431677 -0.847822818 -0.095
# 10 0.17 0.21 0.19 -0.24 -1.8039939 -1.8058328 3.283332 -0.55 0.0018389314 0.021634487 0.085
or this:
inputs$real <- with(inputs,
x1/2 * (sqrt(abs(1 + (x2+x3^2)*x4/(x1^2))) - 1) +
(x1 + 3*x4) * exp(1 + sin(x3))
)

Principal components order - PCA in R

I'm trying to do PCA in R with principal. Actually, I did but I'm curious why my principal compenents are not ordered numerically? I mean Why they are PC1, PC2, PC3. What's the point between this?
tb2 <- principal(tba, nfactors = 4)
tb2
Principal Components Analysis
Call: principal(r = tba, nfactors = 4)
Standardized loadings (pattern matrix) based upon correlation matrix
PC2 PC3 PC1 PC4 h2 u2 com
bio1 0.89 0.28 0.32 -0.05 0.98 0.0248 1.5
bio2 -0.07 -0.22 0.09 0.96 0.99 0.0091 1.1
bio3 0.63 0.21 -0.22 0.60 0.85 0.1497 2.5
bio4 -0.60 -0.40 0.34 0.44 0.83 0.1682 3.3
bio5 0.78 0.15 0.46 0.33 0.95 0.0454 2.1
bio6 0.89 0.36 0.17 -0.21 0.99 0.0088 1.5
bio7 -0.50 -0.38 0.26 0.70 0.96 0.0395 2.8
bio8 0.85 0.12 0.20 -0.19 0.81 0.1896 1.3
bio9 0.85 0.24 0.41 0.03 0.95 0.0525 1.6
bio10 0.85 0.23 0.40 0.04 0.95 0.0533 1.6
bio11 0.90 0.34 0.21 -0.13 0.99 0.0058 1.4
bio12 0.16 0.94 0.03 -0.15 0.93 0.0743 1.1
bio13 0.29 0.93 0.18 -0.09 0.99 0.0086 1.3
bio14 -0.31 -0.18 -0.89 -0.05 0.92 0.0777 1.3
bio15 0.34 0.72 0.56 -0.02 0.94 0.0577 2.4
bio16 0.27 0.93 0.22 -0.10 0.99 0.0069 1.3
bio17 -0.17 -0.16 -0.93 -0.07 0.93 0.0725 1.1
bio18 -0.40 -0.29 -0.84 -0.06 0.96 0.0440 1.7
bio19 0.26 0.93 0.22 -0.09 0.99 0.0066 1.3
PC2 PC3 PC1 PC4
SS loadings 6.84 4.99 3.81 2.26
Proportion Var 0.36 0.26 0.20 0.12
Cumulative Var 0.36 0.62 0.82 0.94
Proportion Explained 0.38 0.28 0.21 0.13
Cumulative Proportion 0.38 0.66 0.87 1.00
Mean item complexity = 1.7
Test of the hypothesis that 4 components are sufficient.
The root mean square of the residuals (RMSR) is 0.03
with the empirical chi square 96803.04 with prob < 0
Thanks in advance!

Resources