R: Cleaning GGally Plots - r

I am using the R programming language and I am new the GGally library. I followed some basic tutorials online and ran the following code:
#load libraries
library(GGally)
library(survival)
library(plotly)
I changed some of the data types:
#manipulate the data
data(lung)
data = lung
data$sex = as.factor(data$sex)
data$status = as.factor(data$status)
data$ph.ecog = as.factor(data$ph.ecog)
Now I visualize:
#make the plots
#I dont know why, but this comes out messy
ggparcoord(data, groupColumn = "sex")
#Cleaner
ggparcoord(data)
Both ggparcoord() code segments successfully ran, however the first one came out pretty messy (the axis labels seem to have been corrupted). Is there a way to fix the labels?
In the second graph, it makes it difficult to tell how the factor variables are labelled on their respective axis (e.g. for the "sex" column, is "male" the bottom point or is "female" the bottom type). Does anyone know if there is a way to fix this?
Finally, is there a way to use the "ggplotly()" function for "ggally" objects?
e.g.
a = ggparcoord(data)
ggplotly(a)
Thanks

Looks like your data columns get converted to a factor when adding the groupColumn. To prevent that you could exclude the groupColumn from the columns to be plotted:
BTW: Not sure about the general case. But at least for ggparcoord ggplotly works.
library(GGally)
library(survival)
data(lung)
data = lung
data$sex = as.factor(data$sex)
data$status = as.factor(data$status)
data$ph.ecog = as.factor(data$ph.ecog)
#I dont know why, but this comes out messy
ggparcoord(data, seq(ncol(data))[!names(data) %in% "sex"], groupColumn = "sex")

Related

Represent a colored polygon in ggplot2

I am using the statspat package because I am working on spatial patterns.
I would like to do in ggplot and with colors instead of numbers (because it is not too readable),
the following graph, produced with the plot.quadratest function: Polygone
The numbers that interest me for the intensity of the colors are those at the bottom of each box.
The test object contains the following data:
Test object
I have looked at the help of the function, as well as the code of the function but I still cannot manage it.
Ideally I would like my final figure to look like this (maybe not with the same colors haha):
Final object
Thanks in advance for your help.
Please provide a reproducible example in the future.
The package reprex may be very helpful.
To use ggplot2 for this my best bet would be to convert
spatstat objects to sf and do the plotting that way,
but it may take some time. If you are willing to use base
graphics and spatstat you could do something like:
library(spatstat)
# Data (using a built-in dataset):
X <- unmark(chorley)
plot(X, main = "")
# Test:
test <- quadrat.test(X, nx = 4)
# Default plot:
plot(test, main = "")
# Extract the the `quadratcount` object (regions with observed counts):
counts <- attr(test, "quadratcount")
# Convert to `tess` (raw regions with no numbers)
regions <- as.tess(counts)
# Add residuals as marks to the tessellation:
marks(regions) <- test$residuals
# Plot regions with marks as colors:
plot(regions, do.col = TRUE, main = "")

Error in axis(side = side, at = at, labels = labels, ...) : invalid value specified for graphical parameter "pch"

I have applied DBSCAN algorithm on built-in dataset iris in R. But I am getting error when tried to visualise the output using the plot( ).
Following is my code.
library(fpc)
library(dbscan)
data("iris")
head(iris,2)
data1 <- iris[,1:4]
head(data1,2)
set.seed(220)
db <- dbscan(data1,eps = 0.45,minPts = 5)
table(db$cluster,iris$Species)
plot(db,data1,main = 'DBSCAN')
Error: Error in axis(side = side, at = at, labels = labels, ...) :
invalid value specified for graphical parameter "pch"
How to rectify this error?
I have a suggestion below, but first I see two issues:
You're loading two packages, fpc and dbscan, both of which have different functions named dbscan(). This could create tricky bugs later (e.g. if you change the order in which you load the packages, different functions will be run).
It's not clear what you're trying to plot, either what the x- or y-axes should be or the type of plot. The function plot() generally takes a vector of values for the x-axis and another for the y-axis (although not always, consult ?plot), but here you're passing it a data.frame and a dbscan object, and it doesn't know how to handle it.
Here's one way of approaching it, using ggplot() to make a scatterplot, and dplyr for some convenience functions:
# load our packages
# note: only loading dbscacn, not loading fpc since we're not using it
library(dbscan)
library(ggplot2)
library(dplyr)
# run dbscan::dbscan() on the first four columns of iris
db <- dbscan::dbscan(iris[,1:4],eps = 0.45,minPts = 5)
# create a new data frame by binding the derived clusters to the original data
# this keeps our input and output in the same dataframe for ease of reference
data2 <- bind_cols(iris, cluster = factor(db$cluster))
# make a table to confirm it gives the same results as the original code
table(data2$cluster, data2$Species)
# using ggplot, make a point plot with "jitter" so each point is visible
# x-axis is species, y-axis is cluster, also coloured according to cluster
ggplot(data2) +
geom_point(mapping = aes(x=Species, y = cluster, colour = cluster),
position = "jitter") +
labs(title = "DBSCAN")
Here's the image it generates:
If you're looking for something else, please be more specific about what the final plot should look like.

Using multiple datasets for one graph

I have 2 csv data files. Each file has a "date_time" column and a "temp_c" column. I want to make the x-axis have the "date_time" from both files and then use 2 y-axes to display each "temp_c" with separate lines. I would like to use plot instead of ggplot2 if possible. I haven't been able to find any code help that works with my data and I'm not sure where to really begin. I know how to do 2 separate plots for these 2 datasets, just not combine them into one graph.
plot(grewl$temp_c ~ grewl$date_time)
and
plot(kbll$temp_c ~ kbll$date_time)
work separately but not together.
As others indicated, it is easy to add new data to a graph using points() or lines(). One thing to be careful about is how you format the axes as they will not be automatically adjusted to fit any new data you input using points() and the like.
I've included a small example below that you can copy, paste, run, and examine. Pay attention to why the first plot fails to produce what you want (axes are bad). Also note how I set this example up generally - by making fake data that showcase the same "problem" you are having. Doing this is often a better strategy than simply pasting in your data since it forces you to think about the core component of the problem you are facing.
#for same result each time
set.seed(1234)
#make data
set1<-data.frame("date1" = seq(1,10),
"temp1" = rnorm(10))
set2<-data.frame("date2" = seq(8,17),
"temp2" = rnorm(10, 1, 1))
#first attempt fails
#plot one
plot(set1$date1, set1$temp1, type = "b")
#add points - oops only three showed up bc the axes are all wrong
lines(set2$date2, set2$temp2, type = "b")
#second attempt
#adjust axes to fit everything (set to min and max of either dataset)
plot(set1$date1, set1$temp1,
xlim = c(min(set1$date1,set2$date2),max(set1$date1,set2$date2)),
ylim = c(min(set1$temp1,set2$temp2),max(set1$temp1,set2$temp2)),
type = "b")
#now add the other points
lines(set2$date2, set2$temp2, type = "b")
# we can even add regression lines
abline(reg = lm(set1$temp1 ~ set1$date1))
abline(reg = lm(set2$temp2 ~ set2$date2))

rCharts Polychart: Adding horizontal or vertical lines to a plot

I'm having some trouble understanding how to customize graphs using the rPlot function in the rCharts Package. Say I have the following code
#Install rCharts if you do not already have it
#This will require devtools, which can be downloaded from CRAN
require(devtools)
install_github('rCharts', 'ramnathv')
#simulate some random normal data
x <- rnorm(100, 50, 5)
y <- rnorm(100, 30, 2)
#store in a data frame for easy retrieval
demoData <- data.frame(x,y)
#generate the rPlot Object
demoChart <- rPlot(y~x, data = demoData, type = 'point')
#return the object // view the plot
demoChart
This will generate a plot and that is nice, but how would I go about adding horizontal lines along the y-axis? For example, if I wanted to plot a green line which represented the average y-value, and then red lines which represented +/- 3 standard deviations from the average? If anybody knows of some documentation and could point me to it then that would be great. However, the only documentation I could find was on the polychart.js (https://github.com/Polychart/polychart2) and I'm not quite sure how to apply this to the rCharts rPlot function in R.
I have done some digging and I feel like the answer is going to have something to do with adding/modifying the layers parameter within the rPlot object.
#look at the slots in this object
demoChart$params$layers
#doing this will return the following output (which will be different for
#everybody because I didn't set a seed). Also, I removed rows 6:100 of the data.
demoChart$params$layers
[[1]]
[[1]]$x
[1] "x"
[[1]]$y
[1] "y"
[[1]]$data
x y
1 49.66518 32.75435
2 42.59585 30.54304
3 53.40338 31.71185
4 58.01907 28.98096
5 55.67123 29.15870
[[1]]$facet
NULL
[[1]]$type
[1] "point"
If I figure this out I will post a solution, but I would appreciate any help/advice in the meantime! I don't have much experience playing with objects in R. I feel like this is supposed to have some similarity to ggplot2 which I also don't have much experience with.
Thanks for any advice!
You can overlay additional graphs onto your rCharts plot using layers. Add values for any additional layers as columns on to your original data.frame. copy_layer lets you use the values from the data.frame in the extra layers.
# Regression Plots using rCharts
require(rCharts)
mtcars$avg <- mean(mtcars$mpg)
mtcars$sdplus <- mtcars$avg + sd(mtcars$mpg)
mtcars$sdneg <- mtcars$avg - sd(mtcars$mpg)
p1 <- rPlot(mpg~wt, data=mtcars, type='point')
p1$layer(y='avg', copy_layer=T, type='line', color=list(const='red'))
p1$layer(y='sdplus', copy_layer=T, type='line', color=list(const='green'))
p1$layer(y='sdneg', copy_layer=T, type='line', color=list(const='green'))
p1
Here are a couple of examples: one from the main rCharts website and the other showing how to overlay a regression line.

Stacked bar in R

I have a table exported in csv from PostgreSQL and I'd like to create a stacked bar graph in R. It's my first project in R.
Here's my data and what I want to do:
It the quality of the feeder bus service for a certain provider in the area. For each user of the train, we assign a service quality based of synchronization between the bus and the train at the train stations and calculate the percentage of user that have a ideal or very good service, a correct service, a deficient service or no service at all (linked to that question in gis.stackexchange)
So, It's like to use my first column as my x-axis labels and my headers as my categories. The data is already normalized to 100% for each row.
In Excel, it's a couple of clicks and I wouldn't mind typing a couple of line of codes since it's the final result of an already quite long plpgsql script... I'd prefer to continue to code instead of moving to Excel (I also have dozens of those to do).
So, I tried to create a stacked bar using the examples in Nathan Yau's "Visualize This" and the book "R in Action" and wasn't quite successful. Normally, their examples use data that they aggregate with R and use that. Mine is already aggregated.
So, I've finally come up with something that works in R:
but I had to transform my data quite a bit:
I had to transpose my table and remove my now-row (ex-column) identifier.
Here's my code:
# load libraries
library(ggplot2)
library(reshape2)
# load data
stl <- read.csv("D:/TEMP/rabat/_stl_rabattement_stats_mtl.csv", sep=";", header=TRUE)
# reshape for plotting
stl_matrix <- as.matrix(stl)
# make a quick plot
barplot(stl_matrix, border=NA, space=0.1, ylim=c(0, 100), xlab="Trains", ylab="%",
main="Qualité du rabattement, STL", las = 3)
Is there any way that I could use my original csv and have the same result?
I'm a little lost here...
Thanks!!!!
Try the ggplot2 and reshape library. You should be able to get the chart you want with
stl$train_order <- as.numeric(rownames(stl))
stl.r <- melt(stl, id.vars = c("train_no", "train_order"))
stl.r$train_no <- factor(
stl.r$train_no,
levels = stl$train_no[order(stl$train_order)])
ggplot(stl.r, aes(x = factor(train_no), y = value, fill = variable)) + geom_bar(stat = 'identity')
It appears that you transposed the matrix manually. This can be done in R with the t() function.
Add the following line after the as.matrix(stl) line:
stl_matrix <- t(stl_matrix)

Resources