fix x-axis ordering in ggplot2 - r

I have the following data
dftmp
z.transient kprimes groupname
1 -1.244061 10 k=9.8,p=56.4
2 -0.995249 20 k=9.8,p=56.4
3 -0.746437 30 k=9.8,p=56.4
4 -0.497625 40 k=9.8,p=56.4
5 -0.248812 50 k=9.8,p=56.4
6 0 60 k=9.8,p=56.4
7 0.248812 70 k=9.8,p=56.4
8 0.497625 80 k=9.8,p=56.4
9 0.746437 90 k=9.8,p=56.4
10 0.995249 100 k=9.8,p=56.4
11 1.244061 110 k=9.8,p=56.4
12 -1.244061 100 k=103.2,p=155.5
13 -0.995249 200 k=103.2,p=155.5
14 -0.746437 300 k=103.2,p=155.5
15 -0.497625 400 k=103.2,p=155.5
16 -0.248812 500 k=103.2,p=155.5
17 0 600 k=103.2,p=155.5
18 0.248812 700 k=103.2,p=155.5
19 0.497625 800 k=103.2,p=155.5
20 0.746437 900 k=103.2,p=155.5
21 0.995249 1000 k=103.2,p=155.5
22 1.244061 1100 k=103.2,p=155.5
23 -1.244061 1000 k=786.9,p=849
24 -0.995249 2000 k=786.9,p=849
25 -0.746437 3000 k=786.9,p=849
26 -0.497625 4000 k=786.9,p=849
27 -0.248812 5000 k=786.9,p=849
28 0 6000 k=786.9,p=849
29 0.248812 7000 k=786.9,p=849
30 0.497625 8000 k=786.9,p=849
31 0.746437 9000 k=786.9,p=849
32 0.995249 10000 k=786.9,p=849
33 1.244061 11000 k=786.9,p=849
I would like to plot it with ggplot2
p <- ggplot(dftmp, aes(x=z.transient, y=kprimes, group=groupname))
p <- p + geom_line(aes(colour=groupname), size=2)
p <- p + scale_y_log10()
But, ggplot2 seems to be ordering the factor as starting from 0, then alternatiing negative and positive, such that I get a wavy line in each plot:
How do I reorder the x factor. Also, how do I specify the axis limits for the y axis?

That's because your z.transient is a factor like you said. It seems to me that its a continuous variable no? If so, convert it from a factor to the value (see ?factor).
dftmp$z.transient <- as.numeric(levels(dftmp$z.transient))[dftmp$z.transient]
Also, as your data is now, if I use it directly the plot looks fine since z.transient is numeric. Try it, use:
dftmp <- read.table('clipboard')
Then follow your plotting steps...
As far as axis limits, this post should steer you in the right direction.

Related

How to generate a sequence for a grid of values, instead of a straight line?

Sequences need to be
- `a0grid` including increments of 100 between starting at 500 and up to 2500 [500,600,700,...,2200,2300,2400,2500]
- `a1grid` including increments of 10 between -100 and 100 [-100,-90,-80,...,80,90,100]
Initialization:
a0 <- seq(500,2500,by=100)
a1 <- seq(-100,100,by=10)
[1] 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500
[1] -100 -90 -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 100
What I'm getting right now:
What I'm looking for (not exactly, but just points everywhere as shown):
You might use expand.grid if you want points everywhere
library(ggplot2)
ggplot(expand.grid(a0, a1)) +
geom_point(aes(x=Var1,y=Var2))
This plot could also be done without any packages
plot(expand.grid(a0, a1), pch = 16)

How to user NSE inside fct_reorder() in ggplot2

I would like to know how to use NSE (Non-Standard Evaluation) expression in fct_reorder() in ggplot2 to replicate charts for different data frames.
This is an example of data frame that I use to draw a chart:
travel_time_br30 travel_time_br30_int time_reduction shift not_shift total
1 0-30 0 10 2780 3268 6048
2 0-30 0 20 2779 3269 6048
3 0-30 0 30 2984 3064 6048
4 0-30 0 40 3211 2837 6048
5 30-60 30 10 2139 2007 4146
6 30-60 30 20 2159 1987 4146
7 30-60 30 30 2363 1783 4146
8 30-60 30 40 2478 1668 4146
9 60-90 60 10 764 658 1422
10 60-90 60 20 721 701 1422
11 60-90 60 30 782 640 1422
12 60-90 60 40 801 621 1422
13 90-120 90 10 296 224 520
14 90-120 90 20 302 218 520
15 90-120 90 30 317 203 520
16 90-120 90 40 314 206 520
17 120-150 120 10 12 10 22
18 120-150 120 20 10 12 22
19 120-150 120 30 10 12 22
20 120-150 120 40 13 9 22
21 150-180 150 10 35 21 56
22 150-180 150 20 40 16 56
23 150-180 150 30 40 16 56
24 150-180 150 40 35 21 56
share
1 45.96561
2 45.94907
3 49.33862
4 53.09193
5 51.59190
6 52.07429
7 56.99469
8 59.76845
9 53.72714
10 50.70323
11 54.99297
12 56.32911
13 56.92308
14 58.07692
15 60.96154
16 60.38462
17 54.54545
18 45.45455
19 45.45455
20 59.09091
21 62.50000
22 71.42857
23 71.42857
24 62.50000
These are the scripts to draw a chart from above data frame:
g.var <- "travel_time_br30"
go.var <- "travel_time_br30_int"
test %>% ggplot(.,aes_(x=as.name(x.var),y=as.name("share"),group=as.name(g.var))) +
geom_line(size=1.4, aes(
color=fct_reorder(travel_time_br30,order(travel_time_br30_int))))
As I have several data frames which has different fields such as access_time_br30, access_time_br30_int instead of travel_time_br30 and travel_time_br30_int in the data frame, I set two variables (g.var and go.var) to easily replicate multiple chars in the same scripts.
As I need to reorder the factor group numerically, in particular, changing order of travel_time_br30 by travel_time_br30_int, I am using fct_reorder function in ggplot2(., aes_(...)). However, if I use aes_ with fct_reorder() in geom_line() as shown as an example in the following script, it returns an error saying Error:fmust be a factor (or character vector).
geom_line(size=1.4, aes_(color=fct_reorder(as.name(g.var),order(as.name(go.var)))))
Fct_reorder() does not seem to have an NSE version like fct_reorder_().
Is it impossible to use both aes_ and fct_reorder() in a sequence of scripts or are there any other solutions?
Based on my novice working knowledge of tidy-eval, you could transform your factor order in mutate() before passing the data into ggplot() and acheive your result.
Sorry I couldn't easily read in your table above, because of the line return so I made a new example off of mtcars that I think captures your intent. (let me know if it doesn't)
mtcars2 <- mutate(mtcars,
gear_int = 6 - gear,
gear_intrev = rev(gear_int)) %>%
mutate_at(vars(cyl, gear), as.factor)
library(rlang)
gg_reorder <- function(data, col_var, col_order) {
eq_var <- sym(col_var) # sym is flexible and my novice preference
eq_ord <- sym(col_order)
data %>% mutate(!!quo_name(eq_var) := fct_reorder(!!eq_var, !!eq_ord) ) %>%
ggplot(aes_(~mpg, ~hp, color = eq_var)) +
geom_line()
}
And now put it to use plotting...
gg_reorder(mtcars2, "gear", "gear_int")
gg_reorder(mtcars2, "gear", "gear_intrev")
I didn't specify all of the aes_() variables as strings but you could pass those as text and use the as.name() pattern. If you want more tidy-eval patterns Edwin Thoen wrote up a bunch of common cases.

Error In R code in LPPL model in R

I am learning R and had problem when I try run LPPL using nls. I used monthly data of KLSE.
> library(tseries)
> library(zoo)
ts<-read.table(file.choose(),header=TRUE)
ts
rdate Close Date
1 8/1998 302.91 0
2 9/1998 373.52 100
3 10/1998 405.33 200
4 11/1998 501.47 300
5 12/1998 586.13 400
6 1/1999 591.43 500
7 2/1999 542.23 600
8 3/1999 502.82 700
9 4/1999 674.96 800
10 5/1999 743.04 900
11 6/1999 811.10 1000
12 7/1999 768.69 1100
13 8/1999 767.06 1200
14 9/1999 675.45 1300
15 10/1999 742.87 1400
16 11/1999 734.66 1500
17 12/1999 812.33 1600
18 1/2000 922.10 1700
19 2/2000 982.24 1800
20 3/2000 974.38 1900
21 4/2000 898.35 2000
22 5/2000 911.51 2100
23 6/2000 833.37 2200
24 7/2000 798.83 2300
25 8/2000 795.84 2400
26 9/2000 713.51 2500
27 10/2000 752.36 2600
28 11/2000 729.95 2700
29 12/2000 679.64 2800
30 1/2001 727.73 2900
31 2/2001 709.39 3000
32 3/2001 647.48 3100
33 4/2001 584.50 3200
34 5/2001 572.88 3300
35 6/2001 592.99 3400
36 7/2001 659.40 3500
37 8/2001 687.16 3600
38 9/2001 615.34 3700
39 10/2001 600.07 3800
40 11/2001 638.02 3900
41 12/2001 696.09 4000
42 1/2002 718.82 4100
43 2/2002 708.91 4200
44 3/2002 756.10 4300
45 4/2002 793.99 4400
46 5/2002 741.76 4500
47 6/2002 725.44 4600
48 7/2002 721.59 4700
49 8/2002 711.36 4800
50 9/2002 638.01 4900
51 10/2002 659.57 5000
52 11/2002 629.22 5100
53 12/2002 646.32 5200
54 1/2003 664.77 5300
55 2/2003 646.80 5400
56 3/2003 635.72 5500
57 4/2003 630.37 5600
58 5/2003 671.46 5700
59 6/2003 691.96 5800
60 7/2003 720.56 5900
61 8/2003 743.30 6000
62 9/2003 733.45 6100
63 10/2003 817.12 6200
64 11/2003 779.28 6300
65 12/2003 793.94 6400
66 1/2004 818.94 6500
67 2/2004 879.24 6600
68 3/2004 901.85 6700
69 4/2004 838.21 6800
70 5/2004 810.67 6900
71 6/2004 819.86 7000
72 7/2004 833.98 7100
73 8/2004 827.98 7200
74 9/2004 849.96 7300
75 10/2004 861.14 7400
76 11/2004 917.19 7500
77 12/2004 907.43 7600
78 1/2005 916.27 7700
79 2/2005 907.38 7800
80 3/2005 871.35 7900
81 4/2005 878.96 8000
82 5/2005 860.73 8100
83 6/2005 888.32 8200
84 7/2005 937.39 8300
85 8/2005 913.56 8400
86 9/2005 927.54 8500
87 10/2005 910.76 8600
88 11/2005 896.13 8700
89 12/2005 899.79 8800
90 1/2006 914.01 8900
91 2/2006 928.94 9000
92 3/2006 926.63 9100
93 4/2006 949.23 9200
94 5/2006 927.78 9300
95 6/2006 914.69 9400
96 7/2006 935.85 9500
97 8/2006 958.12 9600
98 9/2006 967.55 9700
99 10/2006 988.30 9800
100 11/2006 1080.66 9900
101 12/2006 1096.24 10000
102 1/2007 1189.35 10100
103 2/2007 1196.45 10200
104 3/2007 1246.87 10300
105 4/2007 1322.25 10400
106 5/2007 1346.89 10500
107 6/2007 1354.38 10600
108 7/2007 1373.71 10700
109 8/2007 1273.93 10800
110 9/2007 1336.30 10900
111 10/2007 1413.65 11000
112 11/2007 1396.98 11100
113 12/2007 1445.03 11200
df <- data.frame(ts)
df <- data.frame(Date=df$Date,Y=df$Close)
df <- df[!is.na(df$Y),]
library(minpack.lm)
library(ggplot2)
f <- function(pars, xx){pars$a+pars$b*(pars$tc-xx)^pars$m* (1+pars$c*cos(pars$omega*log(pars$tc-xx)+pars$phi))}
resids <- function(p,observed,xx){df$Y-f(p,xx)}
nls.out<-nls.lm(par=list(a=7.048293, b=-8.8e-5, tc=112000, m=0.5, omega=3.03, phi=-9.76, c=-14), fn=resids, observed=df$Y, xx=df$days, control=nls.lm.control(maxiter=1024, ftol=1e-6, maxfev=1e6))
par <- nls.out$par
nls.final<-nls(Y~a+(tc-days)^m*(b+c*cos(omega*log(tc-days)+phi)), data=df, start=par, algorithm="plinear", control=nls.control(maxiter=1024, minFactor=1e-8))
Error in qr.solve(QR.B, cc) : singular matrix 'a' in solve
I got error a singular matrix.What I need to change to avoid this error?
Your problem is: the cosine term is zero for some value, this makes the matrix singular, you basically need to constrict the parameter space. Additionally, I would read more of the literature since some fancy trig work will remove the phi parameter, this improves the nl optimization enough to get useful and reproducible results.

Create a new column based on the conditions of other columns

The raw data is presented as below,
Year Price Volume P1 P2 P3 V1 V2 V3
2009 46 125 25 50 75 200 400 600
2009 65 800 25 50 75 200 400 600
2010 20 560 30 55 90 250 500 800
2010 15 990 30 55 90 250 500 800
2011 89 350 35 70 120 250 500 800
2012 23 100 35 70 120 250 500 800
... ... ... ... ... ... ... ... ...
I try to create a new column named as "Portfolio". If Price and Volume are smaller than P1 and V1, respectively, Portfolio is equal to 11. Then, if else Price is smaller than P1 but Volume is smaller than V2, Portfolio is equal to 12, and so on.
There are 3 breakpoints for Price and also Volume. Therefore, 16 Portfolios are created, which are named 11, 12, 13, 14, 21, 22, 23, 24,...,44.
The result would be as the table below,
Year Price Volume P1 P2 P3 V1 V2 V3 Portfolio
2009 46 125 25 50 75 200 400 600 21
2009 65 800 25 50 75 200 400 600 34
2010 20 560 30 55 90 250 500 800 13
2010 15 990 30 55 90 250 500 800 14
2011 89 350 35 70 120 250 500 800 32
2012 23 100 35 70 120 250 500 800 11
... ... ... ... ... ... ... ... ... ...
Could you please help me to solve this issue. I tried if(){} and else if(){} functions. However, I did not get the result as the second table. That is why I post raw data here. Thank you so much.
The code I tried was as the following,
if ((Price<P1)&&(Volume<V1)){data$Portfolio=11}
else if ((Price<P1)&&(Volume<V2)){data$Portfolio=12}
else if((Price<P1)&&(Volume<V3)){data$Portfolio=13}
else if(Price<P1){data$Portfolio=14}
else if((Price<P2)&&(Volume<V1)){Fin_Ret$port=21}
...
else if(Price>P3){data$Portfolio=44}
The output was,
> if ((Price<P1)&&(Volume<V1)){data$Portfolio=11}
> else if ((Price<P1)&&(Volume<V2)){data$Portfolio=12}
Error: unexpected 'else' in "else"
...
When I tried "&" instead of &&", the result showed,
> if ((mkvalt<MV20)&(BM<BM20)){Fin_Ret$port=11}
Warning message:
In if ((mkvalt < MV20) & (BM < BM20)) { :
the condition has length > 1 and only the first element will be used
I am confused maybe I don't understand fundamental things in R.
You can use:
df$Portfolio[(df$Price<df$P1)&(df$Volume<df$V1)] <- 11
df$Portfolio[(df$Price<df$P1)&(df$Volume<df$V2) & is.na(df$Portfolio)] <- 12
or using dplyr::mutate
library(dplyr)
df <- df %>%
mutate(Portfolio=ifelse((Price<P1)&(Volume<V1),11,NA)) %>%
mutate(Portfolio=ifelse((Price<P1)&(Volume<V2)& is.na(Portfolio),12,Portfolio))
In the code you have given,
else if(Price<P1){data$Portfolio=14}
else if((Price<P2)&&(Volume<V1)){Fin_Ret$port=21}
...
else if(Price>P3){data$Portfolio=44}
Remove if after else in the last line. You should be able to get the expected result.
Here is a different and concise approach using findInterval and data.table. It is based on the observation that the Portfolio id consists of two digits where the first digit is determined solely by the price category and the second digit solely by the volume category.
library(data.table)
dt[, Portfolio := paste0(findInterval(Price, c(-Inf, P1, P2, P3)),
findInterval(Volume, c(-Inf, V1, V2, V3))),
by = .(P1, P2, P3, V1, V2, V3)]
print(dt)
# Year Price Volume P1 P2 P3 V1 V2 V3 Portfolio
#1: 2009 46 125 25 50 75 200 400 600 21
#2: 2009 65 800 25 50 75 200 400 600 34
#3: 2010 20 560 30 55 90 250 500 800 13
#4: 2010 15 990 30 55 90 250 500 800 14
#5: 2011 89 350 35 70 120 250 500 800 32
#6: 2012 23 100 35 70 120 250 500 800 11
findInterval uses right open intervals by default which is in line with the conditions (Price<P1), etc in the code of the OP.
Data
To make it a reproducible example
dt <- fread("Year Price Volume P1 P2 P3 V1 V2 V3
2009 46 125 25 50 75 200 400 600
2009 65 800 25 50 75 200 400 600
2010 20 560 30 55 90 250 500 800
2010 15 990 30 55 90 250 500 800
2011 89 350 35 70 120 250 500 800
2012 23 100 35 70 120 250 500 800")

how to make a pie graph only name top n performance

I haven't been using pie graph a lot in r, is there a way to make a pie graph and only show the top 10 names with percentage?
For example, here's a simple version of my data:
> data
count METRIC_ID
1 8 71
2 2 1035
3 5 1219
4 4 1277
5 1 1322
6 3 1444
7 5 1462
8 17 1720
9 6 2019
10 2 2040
11 1 2413
12 11 2489
13 24 2610
14 29 2737
15 1 2907
16 1 2930
17 2 2992
18 1 2994
19 2 3020
20 4 3045
21 35 3222
22 2 3245
23 5 3306
24 2 3348
25 2 3355
26 2 3381
27 3 3383
28 4 3389
29 6 3404
30 1 3443
31 22 3465
32 3 3558
33 15 3600
34 3 3730
35 6 3750
36 1 3863
37 1 3908
38 5 3913
39 3 3968
40 9 3972
41 2 3978
42 5 4077
43 4 4086
44 3 4124
45 2 4165
46 3 4205
47 8 4206
48 4 4210
49 12 4222
50 4 4228
and I want to see the count of each METRIC_ID's distribution:
pie(data$count, data$METRIC_ID)
But this Chart marks every single METRIC_ID on the graph, when I have over 100 METRIC_ID, it looks like a mess. How can I only mark the top n (for example, n=5) METRIC_ID on the graph, and show the count of that n METRIC_ID only?
Thank you for your help!!!
To suppress plotting of some labels, set them to NA. Try this:
labls <- data$METRIC_ID
labls[data$count < 3] <- NA
pie(data$count, paste(labls))
Simply subset your data before creating the piechart. I'd do somehting like:
Sort your datasets using order.
Select the first ten rows.
Create the pie chart from the resulting data.
Pie charts are not the best way to visualize your data, just google pie chart problems, e.g. this link. I'd go for something like:
library(ggplot2)
dat = dat[order(-dat$count),]
dat = within(dat, {METRIC_ID = factor(METRIC_ID, levels = METRIC_ID)})
ggplot(dat, aes(x = METRIC_ID, y = count)) + geom_point()
Here I just plot all the data, which I think still leads to a readable graph. This graph is more formally known as a dotplot, and is heavily used in the graphics book of Cleveland. Here the height is linked to count, which is much easier to interpret that linking count to the fraction of the area of a circle, as in the case of the piechart.
Find a better type of chart for your data.
Here is a possibility to create the chart you want:
data2 <- data[data$count %in% tail(sort(data$count),5),]
pie(data2$count, data2$METRIC_ID)
Slightly better:
data3 <- data2
data3$METRIC_ID <- as.character(data3$METRIC_ID)
data3 <- rbind(data3,data.frame(count=sum(data[! data$count %in% tail(sort(data$count),5),"count"]),METRIC_ID="others"))
pie(data3$count, data3$METRIC_ID)

Resources