How can I split a string like
x = "0.989(0.975)&0.964(0.937)&0.877(0.771)&&0.962(0.903)&0.971(0.867)&0.932(0.828)&&0.984(0.892)&0.937(0.869)&0.910(0.722)&&0.970(0.867)&0.942(0.811)&0.875(0.747)"
to get all numbers is a numeric vector like
y = c(0.989, 0.975, 0.964, 0.937, 0.877)
and so on.
I want to eliminate the parentheses, the "&" and "&&".
Use gsub with scan i.e. gsub to replace all the characters other than the . and digits with a single delimiter , and then with scan read it at once
out <- scan(text = gsub("[^.0-9]+", ",", x), what = numeric(),
sep=",", quiet = TRUE)
str(out)
#num [1:25] 0.989 0.975 0.964 0.937 0.877 0.771 0.962 0.903 0.971 0.867 ...
Another option using regmatches + as.numeric
as.numeric(regmatches(x, gregexpr("\\d+\\.\\d+", x))[[1]])
gives
[1] 0.989 0.975 0.964 0.937 0.877 0.771 0.962 0.903 0.971 0.867 0.932 0.828
[13] 0.984 0.892 0.937 0.869 0.910 0.722 0.970 0.867 0.942 0.811 0.875 0.747
Related
I have to plot data from immunized animals in a way to visualize possible correlations in protection. As a background, when we vaccinate an animal it produces antibodies, which might or not be linked to protection. We immunized bovine with 9 different proteins and measured antibody titers which goes up to 1.5 (Optical Density (O.D.)). We also measured tick load that goes up to 5000. Each animal have different titers for each protein and different tick loads, maybe some proteins are more important for protection than the others, and we think that a heatmap could illustrate it.
TL;DR: Plot a heatmap with one variable (Ticks) that goes from 6 up to 5000, and another variable (Prot1 to Prot9) that goes up to 1.5.
A sample of my data:
Animal Group Ticks Prot1 Prot2 Prot3 Prot4 Prot5 Prot6 Prot7 Prot8 Prot9
G1-54-102 control 3030 0.734 0.402 0.620 0.455 0.674 0.550 0.654 0.508 0.618
G1-130-102 control 5469 0.765 0.440 0.647 0.354 0.528 0.525 0.542 0.481 0.658
G1-133-102 control 2070 0.367 0.326 0.386 0.219 0.301 0.231 0.339 0.247 0.291
G3-153-102 vaccinated 150 0.890 0.524 0.928 0.403 0.919 0.593 0.901 0.379 0.647
G3-200-102 vaccinated 97 1.370 0.957 1.183 0.658 1.103 0.981 1.051 0.534 1.144
G3-807-102 vaccinated 606 0.975 0.706 1.058 0.626 1.135 0.967 0.938 0.428 1.035
I have little knowledge in R, but I'm really excited to learn more about it. So feel free to put whatever code you want and I will try my best to understand it.
Thank you in advance.
Luiz
Here is an option to use the ggplot2 package to create a heatmap. You will need to convert your data frame from wide format to long format. It is also important to convert the Ticks column from numeric to factor if the numbers are discrete.
library(tidyverse)
library(viridis)
dat2 <- dat %>%
gather(Prot, Value, starts_with("Prot"))
ggplot(dat2, aes(x = factor(Ticks), y = Prot, fill = Value)) +
geom_tile() +
scale_fill_viridis()
DATA
dat <- read.table(text = "Animal Group Ticks Prot1 Prot2 Prot3 Prot4 Prot5 Prot6 Prot7 Prot8 Prot9
'G1-54-102' control 3030 0.734 0.402 0.620 0.455 0.674 0.550 0.654 0.508 0.618
'G1-130-102' control 5469 0.765 0.440 0.647 0.354 0.528 0.525 0.542 0.481 0.658
'G1-133-102' control 2070 0.367 0.326 0.386 0.219 0.301 0.231 0.339 0.247 0.291
'G3-153-102' vaccinated 150 0.890 0.524 0.928 0.403 0.919 0.593 0.901 0.379 0.647
'G3-200-102' vaccinated 97 1.370 0.957 1.183 0.658 1.103 0.981 1.051 0.534 1.144
'G3-807-102' vaccinated 606 0.975 0.706 1.058 0.626 1.135 0.967 0.938 0.428 1.035",
header = TRUE, stringsAsFactors = FALSE)
In the newest version of ggplot2 / the tidyverse, you don't even need to explicitly load the viridis-package. The scale is included via scale_fill_viridis_c(). Exciting times!
I conducted a Factor Analysis in R and discovered there are 4 latent variables (factors).
I am trying to run a diagram/cluster visualization, to better represent my findings.
I would want something like this:
Here is my output:
print(factor2$loadings,cutoff = 0.3)
Loadings:
MR2 MR1 MR4 MR3
Inflatie 0.796
Dobanda 0.439
optq20 0.627
optq22 0.661
optq25 0.489
optq27 0.462
optq28 0.651
optq29 0.359
optq30 0.636
optq36 0.322
optq37 0.621
optq38 0.517
optq39 0.620
optq43 0.543
MR2 MR1 MR4 MR3
SS loadings 1.560 1.524 1.225 0.873
Proportion Var 0.111 0.109 0.087 0.062
Cumulative Var 0.111 0.220 0.308 0.370
I have searched all over the Internet but could not find some useful code for that.
Thank you!
In order to replicate the results of a previous study, I am trying to apply a method of factor analysis of a matrix that is described in Horst (1965) as "basic structure with simultaneous factor solution".
How would I approach this method in R?
Given a matrix m, and providing for instance that I extract two factors, I have tried applying the following:
fa(r = cor(m), rotate = 'none', factors = 2)
but I don't think this approach is right.
Just found out.
Library(psych)
Principal(r= cor(m), rotate = " none ", nfactor= 2)
Does the job. Horst refers to what is also called an eigen value decomposition. It can also be done using eigen() and attaining the same result.
.. not really.. loadings seem pretty close but looking at the maths I am not sure the method described below is akin to eigen value decomposition in fact, looking more closely, the method is applied directly on the raw data and no product momentum calculations are required..
.. I am trying (slowly) to work out the maths myself and to understand what the computation instruction describes.
For your information, here is the standardized matrix that is used for the calculation carried out in the example in the original textbook:
0.444 0.627 1.458 1.754 2.967 2.585 0.970 0.616 0.853
2.648 2.563 1.950 -1.341 -1.015 -0.700 0.904 0.976 0.150
-0.104 -0.159 0.049 0.510 -0.378 -0.468 2.217 2.378 2.291
-0.970 -1.216 -1.129 -0.079 -0.378 -0.645 0.287 0.312 -2.266
-1.164 -1.060 -1.485 -1.878 -0.021 -0.530 -1.483 0.190 -0.429
-0.956 -1.122 -0.938 -1.282 -0.779 0.121 0.447 -1.565 -0.429
0.198 -0.242 -0.055 0.021 0.526 -1.528 -0.575 -1.244 -0.114
-0.035 -0.485 1.129 -0.014 -0.894 -0.316 -1.421 -0.705 -0.349
-1.050 0.786 -0.048 0.101 -0.354 -0.433 -0.298 -0.377 -0.256
0.298 0.197 -0.010 0.558 0.253 0.464 -0.284 -0.240 -0.031
0.568 0.367 -0.429 0.811 -0.007 0.786 -0.250 0.081 0.541
0.125 -0.256 -0.492 0.839 0.079 0.665 -0.513 -0.422 0.039
here are the computation instructions and examples
... I was wondering if this is just a standard approach in factor analysis or in pricipal component analysis.. and if so, which one? The introduction says that this method is rank reduction type solution in the sense that the major product of the factor score and factor loading matrices yields a residual which is precisely of rank equal to that of the original matrix less the number of factors.
This particular type of analysis is "direct" in the sense that is carried out directly on the raw data (at best it is the normalized matrix).
I'm trying to declare the colorAxis and let a series of computed "Scores" define the gradient for coloring the bubbles. The visualization just keeps giving me random colors, all with the "OutlierScore" next to them on an ugly legend to the right of the plot. I don't understand what I'm doing wrong as my options list matches all of the demo codes I find. I'm using the final gvisBubbleChart statement as the output to my renderGvis code in server.R.
Here's some sample data:
Attribute CloseRate Quotes OutlierScore Size
AdvancedShopper:N 0.261 3411 292.47 1.016
AdvancedShopper:Y 0.119 10421 259.68 2.283
PriorCarrier:HP 0.277 1876 186.46 0.739
Vehicles:1 0.183 8784 179.98 1.988
Vehicles:2 0.106 3471 121.81 1.027
LeadType:Cold 0.104 3177 117.09 0.974
SPINOFF:Y 0.414 510 115.65 0.492
LeadType:Warm 0.223 2184 115.47 0.795
MULTI_CAR_DSCNT_FLG:HMC 0.303 879 107.88 0.559
MULTI_CAR_DSCNT_FLG:MC 0.111 3451 105.75 1.024
PRI_CARR_NME:HP 0.253 1287 100.58 0.633
PriorCarrier:GEICO 0.099 2476 99.74 0.847
PriorCarrier:No Prior Insurance 0.304 802 99.61 0.545
PRI_CARR_NME:No Prior Insurance 0.304 802 99.61 0.545
FR_BAND:P-R 0.112 3227 98.15 0.983
PIP_DED:2,500 0.197 3053 95.11 0.952
AgencyName:South Agency 0.213 2120 94.81 0.783
RSrc:SPIN-OFF Additional Policy 0.434 373 91.99 0.467
CompanionType:None 0.141 11332 87.60 2.448
D2V:D1V1 0.175 5830 85.67 1.454
Here's my gvisBubbleChart declaration.
YLim = c(0,max(GData$Quotes)*1.05)
XLim = c(0,max(GData$CloseRate)*1.01)
gvisBubbleChart(GData, idvar="Attribute", xvar="CloseRate", yvar="Quotes", colorvar="OutlierScore", sizevar="Size",
options=list(title="One-Way Bubble Chart",
hAxis=paste("{title: 'Close Rate', minValue:0, maxValue:",XLim[2],"}",sep=""),
vAxis=paste("{title: 'Quotes', minValue:0, maxValue:",YLim[2],"}",sep=""),
width=1400, height=600, colorAxis="{minValue: 0, colors: ['red', 'green']}",
sizeAxis = '{minValue: 0, maxSize: 10}',
bubble="{textStyle:{color: 'none'}}"))
A previous post explained how to do a Chi-squared loop in R on all your data-pairs: Chi Square Analysis using for loop in R.
I wanted to use this code to do the same thing for a Spearman correlation.
I've already tried altering a few of the variables and I was able to calculate the pearson correlation variables using this code:
library(plyr)
combos <- combn(ncol(fullngodata),2)
adply(combos, 2, function(x) {
test <- cor.test(fullngodata[, x[1]], fullngodata[, x[2]])
out <- data.frame("Row" = colnames(fullngodata)[x[1]]
, "Column" = colnames(fullngodata[x[2]])
, "cor" = round(test$statistic,3)
, "df"= test$parameter
, "p.value" = round(test$p.value, 3)
)
return(out)
})
But since I work with data on an ordinal scale, I need to use the Spearman correlation.
I thought I could get this data by just adding the method="spearman" command but this does not seem to work. If I use the code:
library(plyr)
combos <- combn(ncol(fullngodata),2)
adply(combos, 2, function(x) {
test <- cor.test(fullngodata[, x[1]], fullngodata[, x[2]], method="spearman")
out <- data.frame("Row" = colnames(fullngodata)[x[1]]
, "Column" = colnames(fullngodata[x[2]])
, "Chi.Square" = round(test$statistic,3)
, "df"= test$parameter
, "p.value" = round(test$p.value, 3)
)
return(out)
})
I get the response:
Error in data.frame(Row = colnames(fullngodata)[x[1]], Column =
colnames(fullngodata[x[2]]), :
arguments imply differing number of rows: 1, 0
In addition: Warning message:
In cor.test.default(fullngodata[, x[1]], fullngodata[, x[2]], method = "spearman") :
Cannot compute exact p-values with ties
what am I doing wrong?
Try rcor.test function in ltm package.
mat <- matrix(rnorm(1000), 100, 10, dimnames = list(NULL, LETTERS[1:10]))
rcor.test(mat, method = "spearman")
A B C D E F G H I J
A ***** -0.035 0.072 0.238 -0.097 0.007 -0.010 -0.031 0.039 -0.090
B 0.726 ***** -0.042 -0.166 0.005 0.025 0.007 -0.231 0.005 0.006
C 0.473 0.679 ***** 0.046 0.074 -0.020 0.091 -0.183 -0.040 -0.084
D 0.017 0.098 0.647 ***** -0.060 -0.151 -0.175 -0.068 0.039 0.181
E 0.338 0.960 0.466 0.553 ***** 0.254 0.055 -0.031 0.072 -0.059
F 0.948 0.805 0.843 0.133 0.011 ***** -0.014 -0.121 0.153 0.048
G 0.923 0.941 0.370 0.081 0.588 0.892 ***** -0.060 -0.050 0.011
H 0.759 0.021 0.069 0.501 0.756 0.230 0.555 ***** -0.053 -0.193
I 0.700 0.963 0.690 0.701 0.476 0.130 0.621 0.597 ***** -0.034
J 0.373 0.955 0.406 0.072 0.561 0.633 0.910 0.055 0.736 *****
upper diagonal part contains correlation coefficient estimates
lower diagonal part contains corresponding p-values
The problem is that cor.test returns a value NULL for parameter when you do the spearman test. From ?cor.test: parameter: the degrees of freedom of the test statistic in the case that it follows a t distribution.
You can see this in the following example:
x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1)
y <- c( 2.6, 3.1, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8)
str(cor.test(x, y, method = "spearman"))
List of 8
$ statistic : Named num 48
..- attr(*, "names")= chr "S"
$ parameter : NULL
$ p.value : num 0.0968
$ estimate : Named num 0.6
..- attr(*, "names")= chr "rho"
$ null.value : Named num 0
..- attr(*, "names")= chr "rho"
$ alternative: chr "two.sided"
$ method : chr "Spearman's rank correlation rho"
$ data.name : chr "x and y"
- attr(*, "class")= chr "htest"
Solution: if you remove the following line from your code, it should work:
, "df"= test$parameter