I need to modify the scale of the y-axis in a ggplot2 graphic : I want to express the y-axis in thousands and not in units. For example, the labels have to be 0 ; 1,000 ; 2,000 ; 3,000 instead of 0 ; 1000000 ; 2000000 ; 3000000.
Please, don't tell me to divide my data by 1000 !
My question is the same as ggplot2 axis transformation by constant factor. But the solution provided here modifies the lables parameter of the scale_y_continuous function, whereas I need this parameter to be set to comma. With this solution I get the following breaks : 0 ; 1000 ; 2000 ; 3000 ... Breaks are expressed in thousands and not in millions and this is a good point, but I loose the comma labels. I want to see 1,000 ; 2,000 ; 3,000 and not 1000 ; 2000 ; 3000...
So modifying the lables parameter of the scale_y_continuous function isn't useful. That's why I think I have to work with the trans parameter of the scale_y_continuous function instead of the labels parameter.
There are a lot of built-in transformation that match the trans parameter and solve similar problems in the scales package (look at log_trans for example). So I tried to build my own homothetic transformation, with the code below.
library(ggplot2)
var0 <- c(1:100)
var1 <- 1000000*rnorm(100)
homothetic_breaks<- function (n = 5, base = 1000)
{
function(x) {
rng <- (range(x, na.rm = TRUE)/base)
min <- floor(rng[1])
max <- ceiling(rng[2])
if (max == min)
return(base*min)
by <- floor((max - min)/n) + 1
base*seq(min, max, by = by)
}
}
homothetic_trans <- function(base = 1000) {
trans <- function(x) x/base
inv <- function(x) x*base
trans_new(paste0("diviseur_par_", format(base)), trans, inv,
homothetic_breaks(base=base), domain = c(-Inf, Inf))
}
data <- data.frame(var0,var1)
p <- ggplot(data,aes(var0,var1))+geom_path()
p <- p + scale_y_continuous(trans=homothetic_trans,labels = comma)
p
When I run this code I get the following message :
"Error: Input to str_c should be atomic vectors", and the breaks of the y axis arethe same as the ones I get when I run the following code :
library(ggplot2)
var1 <- 1000*rnorm(100)
var0 <- c(1:100)
data <- data.frame(var0,var1)
p <- ggplot(data,aes(var0,var1))+geom_path()
p
Related
When plotting the ratio between two variables, their relative order is often of no concern, yet depending on which variable is in the numerator, its relative size is constrained either to (0,1) or (1, Inf), which is somewhat unintuitive and breaks symmetry. I want to plot ratios "symmetrically", without resorting to symmetric log-scale, by having a y-axis that goes like 1/4, 1/3, 1/2, 1, 2, 3, 4 or, equivalently, 4^-1, 3^-1, 2^-1, 1, 2, 3, 4 in regular intervals. I've come up with the following:
symmult <- function(x){
isf <- is.finite(x) & (x>0)
xf <- x[isf]
xf <- ifelse(xf>=1,
xf-1,
1-(1/xf))
x[isf] <- xf
x[!isf] <- NA
x[!is.finite(x)] <- NA
return(x)
}
symmultinv <- function(x){
isf <- is.finite(x)
xf <- x[isf]
xf <- ifelse(x[isf]>=0,
x[isf]+1,
-1/(x[isf]-1))
x[isf] <- xf
x[!isf] <- NA
x[!is.finite(x)] <- NA
return(x)
}
sym_mult_trans = function(){trans_new("sym_mult", symmult, symmultinv )}
x <- c(-4:-2, 1:4)
x[x<1] <- 1/abs(x[x<1])
ggplot() +
geom_point(aes(x=x, y=x)) +
scale_y_continuous(trans="sym_mult")
The transformation works, but I cannot get the axis labels etc. to work for any 0<x<1, without setting them manually. Any help would be greatly appreciated.
You can create bespoke 'breaks' and 'format' functions that you can use inside trans_new (or pass to scale_y_continuous directly via its breaks and labels parameters).
For the breaks function, remember it will take as input a length-two numeric vector representing the range of the y axis. You must then convert this to a number of appropriate breaks. Here, if the minimum of the range is less than one, we take its reciprocal, find the pretty breaks between one and that number, then take the reciprocal of the output. We concatenate that onto pretty breaks between 1 and our range maximum:
# Define breaks function
symmult_breaks <- function(x) {
c(1 / extended_breaks(5)(c(1/x[x < 1], 1)),
extended_breaks(5)(c(1, x[x >= 1])))
}
For the labelling function, remember, it needs to take as input the vector of numbers produced by our breaks function. We can paste a 1/ in front of the reciprocal of numbers less than one, but leave numbers of 1 or more unaltered:
# Define labelling function
symmult_labs <- function(x) {
labs <- character(length(x))
labs[x >= 1] <- as.character(x[x >= 1])
labs[x < 1] <- paste("1", as.character(1/x[x < 1]), sep = "/")
labs
}
So your full new transformation becomes:
# Use our four functions to define the whole transformation:
sym_mult_trans <- function() {
trans_new(name = "sym_mult",
transform = symmult,
inverse = symmultinv,
breaks = symmult_breaks,
format = symmult_labs)
}
And your plot becomes:
ggplot() +
geom_point(aes(x = x, y = x)) +
scale_y_continuous(trans = "sym_mult")
I have developed code that calculates a value for a given set of parameters, this works for a single set of parameters.
library(spatstat)
library(ggplot2)
library(dplyr)
library(tidyr)
#Generating a clustered landscape
dim <- 2000
radiusCluster<-100
lambdaParent<-.02
lambdaDaughter<-30
hosts<-900
randmod<-0
numbparents<-rpois(1,lambdaParent*dim)
xxParent<-runif(numbparents,0+radiusCluster,dim-radiusCluster)
yyParent<-runif(numbparents,0+radiusCluster,dim-radiusCluster)
numbdaughter<-rpois(numbparents,(lambdaDaughter))
sumdaughter<-sum(numbdaughter)
theta<-2*pi*runif(sumdaughter)
rho<-radiusCluster*sqrt(runif(sumdaughter))
xx0=rho*cos(theta)
yy0=rho*sin(theta)
xx<-rep(xxParent,numbdaughter)
yy<-rep(yyParent,numbdaughter)
xx<-xx+xx0
yy<-yy+yy0
cds<-data.frame(xx,yy)
is_outlier<-function(x){
x > dim| x < 0
}
cds<-cds[!(is_outlier(cds$xx)|is_outlier(cds$yy)),]
sampleselect<-sample(1:nrow(cds),hosts,replace=F)
cds<-cds%>%slice(sampleselect)
randfunction<-function(x){
x<-runif(length(x),0,dim)
}
randselect<-sample(1:nrow(cds),floor(hosts*randmod),replace=F)
cds[randselect,]<-apply(cds[randselect,],1,randfunction)
landscape<-ppp(x=cds$xx,y=cds$yy,window=owin(xrange=c(0,dim),yrange=c(0,dim)))
ggplot(data.frame(landscape))+geom_point(aes(x=x,y=y))+coord_equal()+theme_minimal()
#Calculating a metric for clustering
kk<-Kest(landscape)
plot(kk)
kk_iso<-kk$iso
kk_pois<-kk$theo
kk_div_na<-kk_iso/kk_pois
kk_div_0<-replace_na(kk_div_na,0)
kk_mean<-round(mean(kk_div_0),3)
So I can say for radiusCluster of 100 and randmod of 0, I get a kk_mean of "value". I want to use radiusCluster and randmod as my variables and run this experiment for a set of these variables. I begin by generating the data table that I want.
random_parameter<-rep(c(0,.5,1),3)
radiusCluster_parameter<-rep(c(100,300,600),each=3)
Cluster_metric<-rep(NA,length(radiusCluster_parameter))
parameter_table<-data.frame(random_parameter,radiusCluster_parameter,Cluster_metric)
colnames(parameter_table)<-c("r", "rho", "sigma")
Here r is randmod, rho is radiusCluster and sigma is kk_mean.
Then I create a function of the above code for generating the clustered landscape and calculating the metric.
cluster_function <- function (dim,
lambdaParent,
lambdaDaughter,
hosts,
randmod,
radiusCluster) {
numbparents <- rpois(1, lambdaParent * dim)
xxParent <- runif(numbparents, 0 + radiusCluster, dim - radiusCluster)
yyParent <- runif(numbparents, 0 + radiusCluster, dim - radiusCluster)
numbdaughter <- rpois(numbparents, (lambdaDaughter))
sumdaughter <- sum(numbdaughter)
theta <- 2 * pi * runif(sumdaughter)
rho <- radiusCluster * sqrt(runif(sumdaughter))
xx0 = rho * cos(theta)
yy0 = rho * sin(theta)
xx <- rep(xxParent, numbdaughter)
yy <- rep(yyParent, numbdaughter)
xx <- xx + xx0
yy <- yy + yy0
cds <- data.frame(xx, yy)
is_outlier <- function(x) {
x > dim | x < 0
}
cds <- cds[!(is_outlier(cds$xx) | is_outlier(cds$yy)), ]
sampleselect <- sample(1:nrow(cds), hosts, replace = F)
cds <- cds %>% slice(sampleselect)
randfunction <- function(x) {
x <- runif(length(x), 0, dim)
}
randselect <- sample(1:nrow(cds), floor(hosts * randmod), replace = F)
cds[randselect, ] <- apply(cds[randselect, ], 1, randfunction)
landscape<-ppp(x=cds$xx,y=cds$yy,window=owin(xrange=c(0,dim),yrange=c(0,dim)))
ggplot(data.frame(landscape))+geom_point(aes(x=x,y=y))+coord_equal()+theme_minimal()
kk<-Kest(landscape)
plot(kk)
kk_iso<-kk$iso
kk_pois<-kk$theo
kk_div_na<-kk_iso/kk_pois
kk_div_0<-replace_na(kk_div_na,0)
kk_mean<-round(mean(kk_div_0),3)
}
I then try running cluster_function for a set of parameters, however, this does not work.
cluster_function(dim <- 2000,
lambdaParent <-.02,
lambdaDaughter<-30,
hosts<-900,
randmod<-0,
radiusCluster<-0)
The parameters are defined in the global environment but nothing happens. So I decide to remove the landscape and ggplot command from the function and call the function to an output. Then hopefully the output will be data frame of the co ordinates that I generated in cds and can be used in a ppp() function and be plottable.
output<-cluster_function(dim <- 2000,
lambdaParent <-.02,
lambdaDaughter<-30,
hosts<-900,
randmod<-0,
radiusCluster<-0)
Output is numeric (empty). How can I get the function to work for the parameters in the cluster_function() and is it possible to run this for multiple parameters? I was thinking something along the lines of:
for (i in length(parameter_table)){
cluster_function(dim <- 2000,
lambdaParent <-.02,
lambdaDaughter<-30,
hosts<-900,
randmod<-parameter_table[i,"r"],
radiusCluster<-parameter_table[i,"rho"])
I then try running cluster_function for a set of parameters, however, this does not work
It looks like it's working to me ;) Do you want the ggplot to be printed? You can addp <- ggplot(...) followed be print(p) to see it (you may need to refresh the plot viewer...).
Output is numeric (empty). How can I get the function to work
Add an explicit return: return(cds)
And you can of course run the function multiple times. A for loop works, or you could check out purrr::pmap() or mapply(). Good luck!
I have a dataset with numeric and factor variables. I want to do one page with numeric and other with factor var. First of all, i select factor var with his index.
My df is IRIS dataset.
df<-iris
df$y<-sample(0:1,nrow(iris),replace=TRUE)
fact<-colnames(df)[sapply(df,is.factor)]
index_fact<-which(names(df)%in%fact)
Then i calculate rest of it (numerics)
nm<-ncol(df)-length(fact)
Next step is create loop
i_F=1
i_N=1
list_plotN<- list()
list_plotF<- list()
for (i in 1:length(df)){
plot <- ggplot(df,aes(x=df[,i],color=y,fill=y))+xlab(names(df)[i])
if (is.factor(df[,i])){
p_factor<-plot+geom_bar()
list_plotF[[i_F]]<-p_factor
i_F=i_F+1
}else{
p_numeric <- plot+geom_histogram()
list_plotN[[i_N]]<-p_numeric
i_N=i_N+1
}
}
When i see list_plotF and list_plot_N,it didn't well. It always have same vars. i don't know what i'm doing wrong.
thanks!!!
I don't really follow your for loop code all that well. But from what I see it seems to be saving the last plot in every loop you make. I've reconstructed what I think you need using lapply. I generally prefer lapply to for loops whenever I can.
Lapply takes a list of values and a function and applies that function to every value. you can define your function separately like I have so everything looks cleaner. Then you just mention the function in the lapply command.
In our case the list is a list of columns from your dataframe df. The function it applies first creates our base plot. Then it does a quick check to see if the column it is looking at is a factor.. If it's a factor it creates a bar graph, else it creates a histogram.
histOrBar <- function(var) {
basePlot <- ggplot(df, aes_string(var))
if ( is.factor(df[[var]]) ) {
basePlot + geom_bar()
} else {
basePlot + geom_histogram()
}
}
loDFs <- lapply(colnames(df), histOrBar)
Consider passing column names with aes_string to better align x with df:
for (i in 1:length(df)){
plot <- ggplot(df, aes_string(x=names(df)[i], color="y", fill="y")) +
xlab(names(df)[i])
...
}
To demonstrate the problem using aes() and solution using aes_string() in OP's context, consider the following random data frame with columns of different data types: factor, char, int, num, bool, date.
Data
library(ggplot2)
set.seed(1152019)
alpha <- c(LETTERS, letters, c(0:9))
data_tools <- c("sas", "stata", "spss", "python", "r", "julia")
random_df <- data.frame(
group = sample(data_tools, 500, replace=TRUE),
int = as.numeric(sample(1:15, 500, replace=TRUE)),
num = rnorm(500),
char = replicate(500, paste(sample(LETTERS[1:2], 3, replace=TRUE), collapse="")),
bool = as.numeric(sample(c(TRUE, FALSE), 500, replace=TRUE)),
date = as.Date(sample(as.integer(as.Date('2019-01-01', origin='1970-01-01')):as.integer(Sys.Date()),
500, replace=TRUE), origin='1970-01-01')
)
Graph
fact <- colnames(random_df)[sapply(random_df,is.factor)]
index_fact <- which(names(random_df) %in% fact)
i_F=1
i_N=1
list_plotN <- list()
list_plotF <- list()
plot <- NULL
for (i in 1:length(random_df)){
# aes() VERSION
#plot <- ggplot(random_df, aes(x=random_df[,i], color=group, fill=group)) +
# xlab(names(random_df)[i])
# aes_string() VERSION
plot <- ggplot(random_df, aes_string(x=names(random_df)[i], color="group", fill="group")) +
xlab(names(random_df)[i])
if (is.factor(random_df[,i])){
p_factor <- plot + geom_bar()
list_plotF[[i_F]] <- p_factor
i_F=i_F+1
}else{
p_numeric <- plot + geom_histogram()
list_plotN[[i_N]] <- p_numeric
i_N=i_N+1
}
}
Problem (using aes() where graph outputs DO NOT change according to type)
Solution (using aes_string() where graphs DO change according to type)
(:
I don't know much about R, but I was required to plot a set of box plots from a data ensamble using it.
I have a set of .csv files representing a set of 2D data. They contain the following columns:
i: the row of the matrix
j: the column of the matrix
VBoot: a property of the matrix
My data is 128 x 128, but the .csv just contain indices for non-zero properties.
I have to plot a box plot for each of these files, side by side.
This is my approach:
library(ggplot2)
library(reshape)
# Set the directory to read the files
setwd("/Users/me/data/CSV/")
operatorProperty <- function(operator, property, degrees, m, n)
{
p <- list()
for (degree in degrees)
{
file <- paste(c(degree, operator, property, ".csv"), collapse="")
data <- read.csv(file, header=TRUE, sep=" ", dec=".")
# Create an array m * n to fill with the data
b <- vector(mode="double", length=(m*n))
# Rebuild the complete data to properly build the box plot
b[data$i * m + data$j] = sqrt(data$VBoot)
p <- append(p, list(b))
}
p
}
So far, I just created a list to insert the data for each ensamble.
Then, I though I should build a data.frame:
min_degree = 0
max_degree = 45
delta = 5
m = 128
n = 128
degrees <- seq(min_degree, max_degree, delta)
property <- "VBoot"
operator <- "Prewitt"
Sobel <- operatorProperty(operator, property, degrees, m, n)
df <- data.frame(degrees, Sobel)
df2<- melt(data=df,id.vars="degrees")
p <- ggplot(df2, aes(x=degrees,y=value,colour=variable)) +
geom_boxplot() +
theme(legend.title=element_blank()) +
xlab(expression(theta)) +
ylab("Bootstrap Variance")
However, I can't build the data.frame. I don't know how to proceed. An example of the data can be found here.
Thank you in advance.
Ok. Well, I had to change a few things to get this to work with the sample data. Here's the setup
m = 128
n = 128
operatorProperty <- function(operator, property, degrees, m, n)
{
Map(function(degree) {
file <- paste(c(degree, operator, property, ".csv"), collapse="")
data <- read.table(file, header=TRUE, dec=".")
# Create an array m * n to fill with the data
b <- vector(mode="double", length=(m*n))
# Rebuild the complete data to properly build the box plot
b[data$i * m + data$j] = sqrt(data[[property]])
b
}, degrees)
}
degrees <- c('00','05')
property <- "MSE"
operator <- "Prewitt"
Sobel <- operatorProperty(operator, property, degrees, m, n)
With this modified form, Sobel is a list with named elements corresponding to the different degrees. We can turn this into a data.frame and plot with
df2<- melt(data.frame(Sobel, check.names=F))
p <- ggplot(df2, aes(x=variable,y=value,colour=variable)) +
geom_boxplot() +
theme(legend.title=element_blank()) +
xlab(expression(theta)) +
ylab("Bootstrap Variance")
which looks very funny because you have so many zeros. All of your non-zero entries are just marked as outliers.
But even if we didn't nicely name the Sobel list, It basically was a list with two vectors (once for each degree)
list(c(0,0,0,0, ...), c(0,0,0,0,...))
if you wanted to merge that with degrees and turn into a data.frame, another choice could have been
do.call(rbind, Map(cbind.data.frame, degrees, Sobel))
I am trying to implement Chebyshev filter to smooth a time series but, unfortunately, there are NAs in the data series.
For example,
t <- seq(0, 1, len = 100)
x <- c(sin(2*pi*t*2.3) + 0.25*rnorm(length(t)),NA, cos(2*pi*t*2.3) + 0.25*rnorm(length(t)))
I am using Chebyshev filter: cf1 = cheby1(5, 3, 1/44, type = "low")
I am trying to filter the time series exclude NAs, but not mess up the orders/position. So, I have already tried na.rm=T, but it seems there's no such argument.
Then
z <- filter(cf1, x) # apply filter
Thank you guys.
Try using x <- x[!is.na(x)] to remove the NAs, then run the filter.
You can remove the NAs beforehand using the compelete.cases function. You also might consider imputing the missing data. Check out the mtsdi or Amelia II packages.
EDIT:
Here's a solution with Rcpp. This might be helpful is speed is important:
require(inline)
require(Rcpp)
t <- seq(0, 1, len = 100)
set.seed(7337)
x <- c(sin(2*pi*t*2.3) + 0.25*rnorm(length(t)),NA, cos(2*pi*t*2.3) + 0.25*rnorm(length(t)))
NAs <- x
x2 <- x[!is.na(x)]
#do something to x2
src <- '
Rcpp::NumericVector vecX(vx);
Rcpp::NumericVector vecNA(vNA);
int j = 0; //counter for vx
for (int i=0;i<vecNA.size();i++) {
if (!(R_IsNA(vecNA[i]))) {
//replace and update j
vecNA[i] = vecX[j];
j++;
}
}
return Rcpp::wrap(vecNA);
'
fun <- cxxfunction(signature(vx="numeric",
vNA="numeric"),
src,plugin="Rcpp")
if (identical(x,fun(x2,NAs)))
print("worked")
# [1] "worked"
I don't know if ts objects can have missing values, but if you just want to re-insert the NA values, you can use ?insert from R.utils. There might be a better way to do this.
install.packages(c('R.utils', 'signal'))
require(R.utils)
require(signal)
t <- seq(0, 1, len = 100)
set.seed(7337)
x <- c(sin(2*pi*t*2.3) + 0.25*rnorm(length(t)), NA, NA, cos(2*pi*t*2.3) + 0.25*rnorm(length(t)), NA)
cf1 = cheby1(5, 3, 1/44, type = "low")
xex <- na.omit(x)
z <- filter(cf1, xex) # apply
z <- as.numeric(z)
for (m in attributes(xex)$na.action) {
z <- insert(z, ats = m, values = NA)
}
all.equal(is.na(z), is.na(x))
?insert
Here is a function you can use to filter a signal with NAs in it.
The NAs are ignored rather than replaced by zero.
You can then specify a maximum percentage of weight which the NAs may take at any point of the filtered signal. If there are too many NAs (and too few actual data) at a specific point, the filtered signal itself will be set to NA.
# This function applies a filter to a time series with potentially missing data
filter_with_NA <- function(x,
window_length=12, # will be applied centrally
myfilter=rep(1/window_length,window_length), # a boxcar filter by default
max_percentage_NA=25) # which percentage of weight created by NA should not be exceeded
{
# make the signal longer at both sides
signal <- c(rep(NA,window_length),x,rep(NA,window_length))
# see where data are present and not NA
present <- is.finite(signal)
# replace the NA values by zero
signal[!is.finite(signal)] <- 0
# apply the filter
filtered_signal <- as.numeric(filter(signal,myfilter, sides=2))
# find out which percentage of the filtered signal was created by non-NA values
# this is easy because the filter is linear
original_weight <- as.numeric(filter(present,myfilter, sides=2))
# where this is lower than one, the signal is now artificially smaller
# because we added zeros - compensate that
filtered_signal <- filtered_signal / original_weight
# but where there are too few values present, discard the signal
filtered_signal[100*(1-original_weight) > max_percentage_NA] <- NA
# cut away the padding to left and right which we previously inserted
filtered_signal <- filtered_signal[((window_length+1):(window_length+length(x)))]
return(filtered_signal)
}