I am trying to run this function and trying to obtain output. Inputs are in vector form and response of the function in fact include a vector and not a scalar. I tried this code but its not working. I am new to R and your assistance will be highly appreciated. Following is the code
park91a <- function(xx) {
x1 <- xx[1]
x2 <- xx[2]
x3 <- xx[3]
x4 <- xx[4]
term1a <- x1/2
term1b <- sqrt(abs(1 + (x2+x3^2)*x4/(x1^2))) - 1
term1 <- term1a * term1b
term2a <- x1 + 3*x4
term2b <- exp(1 + sin(x3))
term2 <- term2a * term2b
real <- term1 + term2
return(real)
}
data <- read.csv("TRANSISTORDATA.txt")
mydata <- na.omit(data)
mydata
inputs <- mydata[1:10,1:4]
y <- mydata[,7]
y <- y[1:10]
xx<- inputs
xx
xx <- c(inputs)
park91a(xx)
inputs include data like this
x1 x2 x3 x4
1 0.21 -0.26 0.23 -0.21
2 -0.19 0.18 0.22 0.21
3 -0.19 -0.08 -0.28 -0.28
4 0.19 -0.25 0.28 0.28
5 -0.28 0.25 -0.22 -0.21
6 -0.22 0.21 0.17 0.16
7 -0.22 -0.12 0.27 -0.25
8 0.11 0.23 -0.27 0.24
9 -0.19 -0.19 -0.19 0.24
10 0.17 0.21 0.19 -0.24
If the input is a matrix, then xx[1] will be 0.21, xx[2] will be -0.19, and it will carry on down the first column in that fashion. If you wish to calculate for the whole columns (I assume this is the intention given the column names and names in the function. then you will need to edit the first lines of the function, such as:
x1 <- xx[,1]
x2 <- xx[,2]
x3 <- xx[,3]
x4 <- xx[,4]
You did not gave your desired output. So your question is not totally clear to me. First of all I reconstructed your data:
inputs <- read.table(header=TRUE, text=
" x1 x2 x3 x4
0.21 -0.26 0.23 -0.21
-0.19 0.18 0.22 0.21
-0.19 -0.08 -0.28 -0.28
0.19 -0.25 0.28 0.28
-0.28 0.25 -0.22 -0.21
-0.22 0.21 0.17 0.16
-0.22 -0.12 0.27 -0.25
0.11 0.23 -0.27 0.24
-0.19 -0.19 -0.19 0.24
0.17 0.21 0.19 -0.24")
Eventually you want something like this:
within(inputs, {
term1a <- x1/2
term1b <- sqrt(abs(1 + (x2+x3^2)*x4/(x1^2))) - 1
term1 <- term1a * term1b
term2a <- x1 + 3*x4
term2b <- exp(1 + sin(x3))
term2 <- term2a * term2b
real <- term1 + term2
})
result:
# x1 x2 x3 x4 real term2 term2b term2a term1 term1b term1a
# 1 0.21 -0.26 0.23 -0.21 -1.3910343 -1.4340132 3.414317 -0.42 0.0429788836 0.409322701 0.105
# 2 -0.19 0.18 0.22 0.21 1.4377575 1.4877264 3.381196 0.44 -0.0499689622 0.525989076 -0.095
# 3 -0.19 -0.08 -0.28 -0.28 -2.1243796 -2.1237920 2.061934 -1.03 -0.0005876561 0.006185854 -0.095
# 4 0.19 -0.25 0.28 0.28 3.6507163 3.6910628 3.583556 1.03 -0.0403465463 -0.424700487 0.095
# 5 -0.28 0.25 -0.22 -0.21 -1.9113789 -1.9886573 2.185338 -0.91 0.0772783929 -0.551988521 -0.140
# 6 -0.22 0.21 0.17 0.16 0.7998736 0.8370334 3.219359 0.26 -0.0371597771 0.337816156 -0.110
# 7 -0.22 -0.12 0.27 -0.25 -3.4554087 -3.4427557 3.549233 -0.97 -0.0126529657 0.115026961 -0.110
# 8 0.11 0.23 -0.27 0.24 1.8185544 1.7279556 2.081874 0.83 0.0905987637 1.647250250 0.055
# 9 -0.19 -0.19 -0.19 0.24 1.2732947 1.1927515 2.250475 0.53 0.0805431677 -0.847822818 -0.095
# 10 0.17 0.21 0.19 -0.24 -1.8039939 -1.8058328 3.283332 -0.55 0.0018389314 0.021634487 0.085
or this:
inputs$real <- with(inputs,
x1/2 * (sqrt(abs(1 + (x2+x3^2)*x4/(x1^2))) - 1) +
(x1 + 3*x4) * exp(1 + sin(x3))
)
Related
I have 2 dataframes with different number of rows and columns, and I'd like to show both of them in a circos plot with circlize.
My data looks like this:
df1=data.frame(replicate(7,sample(-200:200,200,rep=TRUE))/100)
df2=data.frame(replicate(2,sample(-200:200,200,rep=TRUE))/100)
#head(df1)
X1 X2 X3 X4 X5 X6 X7
1 -0.03 0.63 -0.33 0.73 -1.37 -1.39 1.96
2 -1.81 -1.24 -1.63 1.58 0.13 1.39 -0.76
3 0.02 -2.00 -1.93 -1.35 1.06 -0.58 -0.77
4 -1.11 -1.38 -0.66 -0.40 1.69 -0.47 -1.55
5 0.98 0.06 0.00 -0.35 1.97 1.74 0.72
6 1.51 -1.68 -0.44 -1.74 0.15 0.26 0.36
#head(df2)
X1 X2
1 0.16 -0.81
2 -1.38 -0.16
3 -0.22 -0.74
4 0.73 -0.82
5 0.58 -1.87
6 -0.63 1.50
I want to build a single circos plot where the top is showing df1 and bottom is showing df2, but I can only show individual dfs. For instance, this is how I show df1:
col_fun1=colorRamp2(c(min(df1), 0, max(df1)), c("blue", "white", "red"))
circos.heatmap(df1, col = col_fun1, cluster = T, track.height = 0.2, rownames.side = "outside", rownames.cex = 0.6)
circos.clear()
How can I df1 only in the top half, and df2 only in the bottom half?
I am working on EFA and would like to customize my tables. There is a function, psych.print to suppress factor loadings of a certain value to make the table easier to read. When I run this function, it produces this data and the summary stats in the console (in an .RMD document, it produces console text and a separate data frame of the factor loadings with loadings suppressed). However, if I attempt to save this as an object, it does not keep this data.
Here is an example:
library(psych)
bfi_data=bfi
bfi_data=bfi_data[complete.cases(bfi_data),]
bfi_cor <- cor(bfi_data)
factors_data <- fa(r = bfi_cor, nfactors = 6)
print.psych(fa_ml_oblimin_2, cut=.32, sort="TRUE")
In an R script, it produces this:
item MR2 MR3 MR1 MR5 MR4 MR6 h2 u2 com
N2 17 0.83 0.654 0.35 1.0
N1 16 0.82 0.666 0.33 1.1
N3 18 0.69 0.549 0.45 1.1
N5 20 0.47 0.376 0.62 2.2
N4 19 0.44 0.43 0.506 0.49 2.4
C4 9 -0.67 0.555 0.45 1.3
C2 7 0.66 0.475 0.53 1.4
C5 10 -0.56 0.433 0.57 1.4
C3 8 0.56 0.317 0.68 1.1
C1 6 0.54 0.344 0.66 1.3
In R Markdown, it produces this:
How can I save that data.frame as an object?
Looking at the str of the object it doesn't look that what you want is built-in. An ugly way would be to use capture.output and try to convert the character vector to dataframe using string manipulation. Else since the data is being displayed it means that the data is present somewhere in the object itself. I could find out vectors of same length which can be combined to form the dataframe.
loadings <- unclass(factors_data$loadings)
h2 <- factors_data$communalities
#There is also factors_data$communality which has same values
u2 <- factors_data$uniquenesses
com <- factors_data$complexity
data <- cbind(loadings, h2, u2, com)
data
This returns :
# MR2 MR3 MR1 MR5 MR4 MR6 h2 u2 com
#A1 0.11 0.07 -0.07 -0.56 -0.01 0.35 0.38 0.62 1.85
#A2 0.03 0.09 -0.08 0.64 0.01 -0.06 0.47 0.53 1.09
#A3 -0.04 0.04 -0.10 0.60 0.07 0.16 0.51 0.49 1.26
#A4 -0.07 0.19 -0.07 0.41 -0.13 0.13 0.29 0.71 2.05
#A5 -0.17 0.01 -0.16 0.47 0.10 0.22 0.47 0.53 2.11
#C1 0.05 0.54 0.08 -0.02 0.19 0.05 0.34 0.66 1.32
#C2 0.09 0.66 0.17 0.06 0.08 0.16 0.47 0.53 1.36
#C3 0.00 0.56 0.07 0.07 -0.04 0.05 0.32 0.68 1.09
#C4 0.07 -0.67 0.10 -0.01 0.02 0.25 0.55 0.45 1.35
#C5 0.15 -0.56 0.17 0.02 0.10 0.01 0.43 0.57 1.41
#E1 -0.14 0.09 0.61 -0.14 -0.08 0.09 0.41 0.59 1.34
#E2 0.06 -0.03 0.68 -0.07 -0.08 -0.01 0.56 0.44 1.07
#E3 0.02 0.01 -0.32 0.17 0.38 0.28 0.51 0.49 3.28
#E4 -0.07 0.03 -0.49 0.25 0.00 0.31 0.56 0.44 2.26
#E5 0.16 0.27 -0.39 0.07 0.24 0.04 0.41 0.59 3.01
#N1 0.82 -0.01 -0.09 -0.09 -0.03 0.02 0.67 0.33 1.05
#N2 0.83 0.02 -0.07 -0.07 0.01 -0.07 0.65 0.35 1.04
#N3 0.69 -0.03 0.13 0.09 0.02 0.06 0.55 0.45 1.12
#N4 0.44 -0.14 0.43 0.09 0.10 0.01 0.51 0.49 2.41
#N5 0.47 -0.01 0.21 0.21 -0.17 0.09 0.38 0.62 2.23
#O1 -0.05 0.07 -0.01 -0.04 0.57 0.09 0.36 0.64 1.11
#O2 0.12 -0.09 0.01 0.12 -0.43 0.28 0.30 0.70 2.20
#O3 0.01 0.00 -0.10 0.05 0.65 0.04 0.48 0.52 1.06
#O4 0.10 -0.05 0.34 0.15 0.37 -0.04 0.24 0.76 2.55
#O5 0.04 -0.04 -0.02 -0.01 -0.50 0.30 0.33 0.67 1.67
#gender 0.20 0.09 -0.12 0.33 -0.21 -0.15 0.18 0.82 3.58
#education -0.03 0.01 0.05 0.11 0.12 -0.22 0.07 0.93 2.17
#age -0.06 0.07 -0.02 0.16 0.03 -0.26 0.10 0.90 2.05
Ronak Shaw answered my question above, and I used his answer to help create the following function, which nearly reproduces the psych.print data.frame of fa.sort output
fa_table <- function(x, cut) {
#get sorted loadings
loadings <- fa.sort(fa_ml_oblimin)$loadings %>% round(3)
#cut loadings
loadings[loadings < cut] <- ""
#get additional info
add_info <- cbind(x$communalities,
x$uniquenesses,
x$complexity) %>%
as.data.frame() %>%
rename("commonality" = V1,
"uniqueness" = V2,
"complexity" = V3) %>%
rownames_to_column("item")
#build table
loadings %>%
unclass() %>%
as.data.frame() %>%
rownames_to_column("item") %>%
left_join(add_info) %>%
mutate(across(where(is.numeric), round, 3))
}
I have a list of all files (dataframes) within a directory:
library("plyr")
library("dplyr")
library("broom")
library("tidyr")
snp_list <- list.files(pattern="*.txt", all.files = T,full.names = F)
I also have a dataframe A obtained through the following function:
pv1= lapply(snp_list, function(x) tidy(lm(PV ~ GT*SEX + M + GT*N,read.table(x,header=TRUE)))) %>%
bind_rows()
Dataframe A has 7 rows ((Intercept), GT, SEX, M, N, GT:SEX, GT:N) for each element in list snp_list. In this toy example the list has 3 elements (rs1406947.txt rs25904.txt rs7133579.txt), but in reality there are 1,200,000 elements
A:
term estimate st.error statistic p.value
(Intercept) 7.68 0.17 44.64 0
GT 0.01 0.01 0.07 0.19
SEX 1.52 0.14 10.87 0.1
M 0.12 0.29 0.41 0.67
N -0.06 0.12 -0.48 0.63
GT:SEX -0.03 0.08 -0.44 0.65
GT:N -0.00 0.06 -0.08 0.93
(Intercept) 9.23 0.20 34.64 0
GT 0.05 0.04 0.12 0.22
SEX 1.67 0.76 10.34 0.1
M 0.14 0.39 0.51 0.55
N -0.08 0.05 -0.46 0.55
GT:SEX -0.19 0.11 -0.34 0.44
GT:N -0.22 0.33 -0.44 0.55
(Intercept) 7.99 0.66 44.44 0
GT 0.01 0.3 0.04 0.33
SEX 1.22 0.22 10.44 0.15
M 0.88 0.22 0.33 0.44
N -0.5 0.5 -0.5 0.6
GT:SEX -0.06 0.09 -0.74 0.35
GT:N -0.00 0.03 -0.04 0.78
I want to add a new column "SNP" to A, where each row has the name of the element the rows belongs to (nrows = 7*1,200,000). I would get this:
term estimate st.error statistic p.value SNP
(Intercept) 7.68 0.17 44.64 0 rs1406947
GT 0.01 0.01 0.07 0.19 rs1406947
SEX 1.52 0.14 10.87 0.1 rs1406947
M 0.12 0.29 0.41 0.67 rs1406947
N -0.06 0.12 -0.48 0.63 rs1406947
GT:SEX -0.03 0.08 -0.44 0.65 rs1406947
GT:N -0.00 0.06 -0.08 0.93 rs1406947
(Intercept) 9.23 0.20 34.64 0 rs25904
GT 0.05 0.04 0.12 0.22 rs25904
SEX 1.67 0.76 10.34 0.1 rs25904
M 0.14 0.39 0.51 0.55 rs25904
N -0.08 0.05 -0.46 0.55 rs25904
GT:SEX -0.19 0.11 -0.34 0.44 rs25904
GT:N -0.22 0.33 -0.44 0.55 rs25904
(Intercept) 7.99 0.66 44.44 0 rs7133579
GT 0.01 0.3 0.04 0.33 rs7133579
SEX 1.22 0.22 10.44 0.15 rs7133579
M 0.88 0.22 0.33 0.44 rs7133579
N -0.5 0.5 -0.5 0.6 rs7133579
GT:SEX -0.06 0.09 -0.74 0.35 rs7133579
GT:N -0.00 0.03 -0.04 0.78 rs7133579
Here's how to do what you asked:
A$SNP=rep(0,nrow(A))
for (i in 1:nrow(A)){
A$SNP[i]=snp_list[(i%/%8)+1]
}
Using integer division, you can generate an index for 7 elements to map to each element in snp_list.
Let's say we have df1 with p values:
Symbol p1 p2 p3 p4 p5
AABT 0.01 0.12 0.23 0.02 0.32
ABC1 0.13 0.01 0.01 0.12 0.02
ACDC 0.15 0.01 0.34 0.24 0.01
BAM1 0.01 0.02 0.04 0.01 0.02
BCR 0.01 0.36 0.02 0.07 0.04
BDSM 0.02 0.43 0.01 0.03 0.41
BGL 0.27 0.77 0.01 0.04 0.02
and df2 with Fold Changes:
Symbol FC1 FC2 FC3 FC4 FC5
AABT 1.21 -0.32 0.23 -0.72 0.45
ABC1 0.13 0.93 -1.61 0.12 1.03
ACDC 0.23 1.31 0.42 -0.39 1.50
BAM1 -1.33 -1.27 -0.89 1.22 -1.03
BCR 1.43 -0.25 1.29 0.54 0.97
BDSM 1.20 0.23 -1.98 -1.09 -0.31
BGL 0.33 0.12 -1.33 -1.14 -1.23
I would like to do the following in df2:
Keep rows that in df1, have values < 0.05 in 3/5 of columns or greater
Eliminate rows that show discordant signs of FC. FC should be taken into consideration only when the respective p from df1 is lower than 0.05 (i.e. significant)
Sort the resulting data in an intuitive order so as to discriminate rows having positive FC from rows having negative FC, and if possible, discriminate rows whose significances in FC arise sequentially (e.g. FC3 FC4 FC5) from others that don't (e.g. FC1 FC3 FC5)
For example, step 1 would result in:
Symbol FC1 FC2 FC3 FC4 FC5
ABC1 0.13 0.93 -1.61 0.12 1.03
BAM1 -1.33 -1.27 -0.89 1.22 -1.03
BCR 1.43 -0.25 1.29 0.54 0.97
BDSM 1.20 0.23 -1.98 -1.09 -0.31
BGL 0.33 0.12 -1.33 -1.14 -1.23
and step 2, in:
Symbol FC1 FC2 FC3 FC4 FC5
BCR 1.43 -0.25 1.29 0.54 0.97
BGL 0.33 0.12 -1.33 -1.14 -1.23
How can this be achieved? I imagine using a for loop and the count function would do the job for step 1, but steps 2 and 3 look somewhat complicated to me. Thank you in advance for your elegant solutions.
data
df1:
df1 <- read.table(h=T,strin=F,text="Symbol p1 p2 p3 p4 p5
AABT 0.01 0.12 0.23 0.02 0.32
ABC1 0.13 0.01 0.01 0.12 0.02
ACDC 0.15 0.01 0.34 0.24 0.01
BAM1 0.01 0.02 0.04 0.01 0.02
BCR 0.01 0.36 0.02 0.07 0.04
BDSM 0.02 0.43 0.01 0.03 0.41
BGL 0.27 0.77 0.01 0.04 0.02")
df2:
df2 <- read.table(h=T,strin=F,text="Symbol FC1 FC2 FC3 FC4 FC5
AABT 1.21 -0.32 0.23 -0.72 0.45
ABC1 0.13 0.93 -1.61 0.12 1.03
ACDC 0.23 1.31 0.42 -0.39 1.50
BAM1 -1.33 -1.27 -0.89 1.22 -1.03
BCR 1.43 -0.25 1.29 0.54 0.97
BDSM 1.20 0.23 -1.98 -1.09 -0.31
BGL 0.33 0.12 -1.33 -1.14 -1.23")
I'm not sure how elegant this is, but you can get the result you requested using apply and sapply with subsetting, like this:
# Create logical matrix telling us whether p values are significant
sig <- apply(df1[-1], 2, function(x) x < 0.05)
# Create numeric matrix of the sign of each FC (will be either -1 or 1)
sign <- apply(df2[-1], 2, function(x) sign(x))
# Create a vector telling us whether there were 3 or more p < 0.05 in each row
ss1 <- apply(sig, 1, function(x) length(which(x)) > 2)
# Create a vector telling us whether all FC signs match excluding p = ns
ss2 <- sapply(seq(nrow(df1)), function(i) length(table(sign[i,][sig[i,]])) == 1)
# Subset the data frames accordingly:
df1[ss1, ]
#> Symbol p1 p2 p3 p4 p5
#> 2 ABC1 0.13 0.01 0.01 0.12 0.02
#> 4 BAM1 0.01 0.02 0.04 0.01 0.02
#> 5 BCR 0.01 0.36 0.02 0.07 0.04
#> 6 BDSM 0.02 0.43 0.01 0.03 0.41
#> 7 BGL 0.27 0.77 0.01 0.04 0.02
df2[ss1 & ss2, ]
#> Symbol FC1 FC2 FC3 FC4 FC5
#> 5 BCR 1.43 -0.25 1.29 0.54 0.97
#> 7 BGL 0.33 0.12 -1.33 -1.14 -1.23
Created on 2020-07-10 by the reprex package (v0.3.0)
Subject var1 var2 var3 var4 var5
1 0.2 0.78 7.21 0.5 0.47
1 0.52 1.8 11.77 -0.27 -0.22
1 0.22 0.84 7.32 0.35 0.36
2 0.38 1.38 10.05 -0.25 -0.2
2 0.56 1.99 13.76 -0.44 -0.38
3 0.35 1.19 7.23 -0.16 -0.06
4 0.09 0.36 4.01 0.55 0.51
4 0.29 1.08 9.48 -0.57 -0.54
4 0.27 1.03 9.42 -0.19 -0.21
4 0.25 0.9 7.06 0.12 0.12
5 0.18 0.65 5.22 0.41 0.42
5 0.15 0.57 5.72 0.01 0.01
6 0.26 0.94 7.38 -0.17 -0.13
6 0.14 0.54 5.13 0.16 0.17
6 0.22 0.84 6.97 -0.66 -0.58
6 0.18 0.66 5.79 0.23 0.25
# the above is sample data matrix (dat11)
# The following lines of function is to calculate the p-value (P.z) for a
# variable pair var2 and var3 using lmer().
fit1 <- lmer(var2 ~ var3 + (1|Subject), data = dat11)
summary(fit1)
coefs <- data.frame(coef(summary(fit1)))
# use normal distribution to approximate p-value
coefs$p.z <- 2 * (1 - pnorm(abs(coefs$t.value)))
round(coefs,6)
# the following is the result
Estimate Std.Error t.value p.z
(Intercept) -0.280424 0.110277 -2.542913 0.010993
var3 0.163764 0.013189 12.417034 0.000000
The real data contains 65 variables (var1, var2....var65). I would like to use the above codes to find the above result for all possible pairs of 65 variables, eg, var1 ~ var2, var1 ~var3, ... var1 ~var65; var2 ~var3, var2 ~ var4, ... var2~var65; var3~var4, ... and so on. There will be about 2000 pairs. Can somebody help me with the loop codes and get the results to a .csv file? Thank you.