hcboxplot not building correct boxplots - r

While working on a project, I had a dataframe with a column for which I needed to build a boxplot using highcharter.
The column looked like this:
head(data,20)
Col1
1 30
2 30
3 30
4 30
5 28
6 27
7 29
8 27
9 30
10 30
11 28
12 29
13 29
14 30
15 30
16 30
17 30
18 29
19 30
20 NA
and so on.. data has 693 observations. The data is 'labelled' and hence while building the boxplot, I use as.numeric().
When I try building boxplot with hcboxplot ->
hcboxplot(x = as.numeric(data$Col1[!is.na(data$Col1)]))
The boxplot details I get are -
Maximum: 30
Upper Quartile: 30
Median: 29
Lower Quartile: 28
Minimum: 25
But in data, there definitely are values less than 25 (even 0)
> min(data$Col1[!is.na(data$Col1)])
[1] 0
So, the maximum value is correct but minimum definitely is not.
Why is the boxplot not coming out right ? Any help is appreciated! Apologies for not being able to add a pic to the question (Reputation restrictions).
Thank you.

Related

How do i get subset of a data frame where Column A value equals Column B value in R ( Data read from CSV)

I am trying to create a subset of a data frame :
Original Data frame looks like :
Column A Column B Column C
---------------------------------
22 22 30
18 35 28
25 25 29
25 42 22
75 75 33
I would like to get subset where Column-A value == Column-B Value , End result would look like :
Column A Column B Column C
---------------------------------
22 22 30
25 25 29
75 75 33
Is there any 1 liner solution to achieve this ? Thanks!
Note : I read data from CSV (I haven't provided this data point in original post , sorry).
I get an error when i try : df[df$Column.A==df$Column.B,]
Error in Ops.factor(df$ColumnA, df$ColumnB) :
level sets of factors are different
Here's a one-liner:
df1[df1$Column.A==df1$Column.B,]
# Column.A Column.B Column.C
#1 22 22 30
#3 25 25 29
#5 75 75 33
data
df1 <- read.table(text="Column.A Column.B Column.C
22 22 30
18 35 28
25 25 29
25 42 22
75 75 33", header=T)

Data frame header in R

I am trying to make some calculations with data from oracle db using R. I connected to the DB and extracted the data correctly.
> y=dbGetQuery(con, "select distinct(fk_parametro) from t_datos")
> y
FK_PARAMETRO
1 30
2 42
3 43
4 83
5 87
6 1
7 6
8 44
9 20
10 14
11 86
12 88
13 85
14 81
15 35
16 8
17 80
18 89
19 7
20 12
21 82
22 9
23 10
The following command.. works:
> sum(y)
[1] 1042
But this one.. fails:
> mean(y)
[1] NA
Warning message:
In mean.default(y) : argument is not numeric or logical: returning NA
I think it happens because R is considering the header "FK_PARAMETRO" as an element. can someone help me to figure out?
As commented by #akrun, this works
mean(y[,1])
Or as suggested by #PierreLafortune, could also do
colMeans(y)

Overlay two differently formatted qplots in ggplot2

I have two scatterplots, based on different but related data, created using qplot() from ggplot2. (Learning ggplot hasn't been a priority because qplot has been sufficient for my needs up to now). What I want to do is superimpose/overlay the two charts so that the x,y data for each is plotted in the same plot space. The complication is that I want each plot to retain its formatting/aesthetics.
That data in question are row and column scores from correspondence analysis - corresp() from MASS - so the number of data rows (i.e. samples or taxa) differ between the two datasets. I can plot the two score sets together easily. Either by combing the two datasets or, even easier, just using the biplot() function.
However, I have been using qplot to get the plots looking exactly as I need them; with samples plotted as colour-coded symbols and taxa as labels:
PlotSample <- qplot(DataCorresp$rscore[,1], DataCorresp$rscore[,2],
colour=factor(DataAll$ColourCode)) +
scale_colour_manual(values = c("black","darkgoldenrod2",
"deepskyblue2","deeppink2"))
and
PlotTaxa <- qplot(DataCorresp$cscore[,1], DataCorresp$cscore[,2],
label=colnames(DataCorresp), size=10, geom=“text”)
Can anyone suggest a way by which either
the two plots (PlotSample and PlotTaxa) can be superimposed atop of each other,
the two datasets (DataCorresp$rscore and DataCorresp$cscore) can be plotted together but formatted in their different ways, or
another function (e.g. biplot()) that could be used to achieve my aim.
Example of workflow using a extremely simplified and made-up dataset:
> require(MASS)
> require(ggplot2)
> alldata<-read.csv("Fake data.csv",header=T,row.name=1)
> selectdata<-alldata[,2:10]
> alldata
Period Species.1 Species.2 Species.3 Species.4 Species.5 Species.6
Sample-1 Early 50 87 97 12 60 49
Sample-2 Early 41 90 36 52 36 27
Sample-3 Early 87 56 82 45 56 13
Sample-4 Early 37 47 78 29 53 34
Sample-5 Early 58 70 34 35 8 21
Sample-6 Early 94 82 48 16 27 26
Sample-7 Early 91 69 50 57 24 13
Sample-8 Early 63 38 86 20 28 11
Sample-9 Middle 4 19 55 99 86 38
Sample-10 Middle 29 25 10 93 37 54
Sample-11 Middle 48 12 59 73 39 92
Sample-12 Middle 31 6 34 81 39 54
Sample-13 Middle 29 40 26 52 34 84
Sample-14 Middle 1 46 15 97 67 41
Sample-15 Late 43 47 30 18 60 23
Sample-16 Late 45 10 49 2 2 45
Sample-17 Late 14 8 51 36 58 51
Sample-18 Late 41 51 32 47 23 43
Sample-19 Late 43 17 6 54 4 12
Sample-20 Late 20 25 1 29 35 2
Species.7 Species.8 Species.9
Sample-1 41 39 57
Sample-2 59 4 45
Sample-3 10 56 5
Sample-4 59 30 39
Sample-5 9 29 57
Sample-6 29 24 35
Sample-7 22 4 42
Sample-8 31 19 40
Sample-9 17 7 57
Sample-10 6 9 29
Sample-11 34 20 0
Sample-12 56 41 59
Sample-13 6 31 13
Sample-14 25 12 28
Sample-15 60 75 84
Sample-16 32 69 34
Sample-17 48 53 56
Sample-18 80 86 46
Sample-19 50 70 82
Sample-20 57 84 70
> biplot(selectca,cex=c(0.6,0.6))
> selectca<-corresp(selectdata,nf=5)
> PlotSample <- qplot(selectca$rscore[,1], selectca$rscore[,2], colour=factor(alldata$Period) )
> PlotTaxa<-qplot(selectca$cscore[,1], selectca$cscore[,2], label=colnames(selectdata), size=10, geom="text")
The biplot will produce this plot: /r/10wk1a8/5
The PlotSample appears as such: /r/i29cba/5
The PlotTaxa appears as such: /r/245bl9d/5
EDIT so don't have enough rep to post pictures and tinypic links not accepted (despite https://meta.stackexchange.com/questions/60563/how-to-upload-images-on-stack-overflow). So if you add tinypic's URL to the start of those codes above you'll get there.
Essentially I want to creat the biplot plot but with samples colour coded as they are in PlotSample.
Have a look at Gavin Simpsons ggvegan-package!
require(vegan)
require(ggvegan)
# some data
data(dune)
# CA
mod <- cca(dune)
# plot
autoplot(mod, geom = 'text')
For a finer control (or if you want to stick with corresp(), you may also want to take a look at the code of the two involved functions fortify.cca (which wraps the data in the cca objects into a useable format for ggplot) and autoplot.cca for creating the plot.
I you want to do it from scratch, you'll have to wrap both scores (sites and species) into one data.frame (see how fortify.cca does this and extract the relevant values from the corresp() object) and use this to build the plot.

For loop in R with increments

I am trying to write a for loop which will increment its value by 2. The equivalent code is c is
for (i=0; i<=78; i=i+2)
How do I achieve the same in R?
See ?seq for more info:
for(i in seq(from=1, to=78, by=2)){
# stuff, such as
print(i)
}
or
for(i in seq(1, 78, 2))
p.s. Pardon my C ignorance. There, I just outed myself.
However, this is a way to do what you want in R (please see updated code)
EDIT
After learning a bit of how C works, it looks like the example posted in the question iterates over the following sequence: 0 2 4 6 8 ... 74 76 78.
To replicate that exactly in R, start at 0 instead of at 1, as above.
seq(from=0, to=78, by=2)
[1] 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44
[24] 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78
you can do so in following way, you can put any length upto which you want iteration in place of length(v1), and the increment value at position of 2 to your desired value
for(i in seq(1,length(v1),2))

Import stuff from a R file

I'm studying R, and I was asked to use the pi2000 data set, available at the simpleR library (http://www.math.csi.cuny.edu/Statistics/R/simpleR/index.html/simpleR.R). Then I downloaded this file. How do I import it to the command line?
Use the source function:
> source("http://www.math.csi.cuny.edu/Statistics/R/simpleR/index.html/simpleR.R")
You can then access the variables defined there:
> vacation
[1] 23 12 10 34 25 16 27 18 28 13 14 20 8 21 23 33 30 13 16 14 38 19 6 11 15 21 10
[28] 39 42 25 12 17 19 26 20
for more information, type ?source into an R terminal.
Do you mean load the function into R? If so, use source():
source("http://www.math.csi.cuny.edu/Statistics/R/simpleR/index.html/simpleR.R")

Resources