How to make a single line in different size - r

I have this sample data:
head(output.melt,10)
month variable value LineSize
1 01 1997 100.00000 1
2 02 1997 91.84783 1
3 03 1998 92.67626 1
4 04 1998 105.70113 1
5 05 1998 115.12516 1
6 06 1998 118.95298 1
7 07 1999 117.99673 1
8 08 1999 125.50852 1
9 09 1999 119.39502 1
10 10 1999 100.79032 1
11 03 Mean 103.17473 2
12 04 Mean 108.12440 2
13 05 Mean 109.54016 2
14 06 Mean 107.71431 2
15 07 Mean 107.86694 2
16 08 Mean 108.32371 2
17 09 Mean 102.06684 2
18 10 Mean 99.96975 2
19 11 Mean 111.94529 2
20 12 Mean 113.49491 2
I want to make a plot where one line has different linetype and size. I get the different linetype but not size:
theplot=ggplot(data = output.melt, aes(x=month, y=value,colour=variable,group=variable,linetype = LineSize))
+geom_line()
+scale_linetype( guide="none")
+ggtitle(as.character("Hello"))+theme_economist()
But the code above does not make the line (where LineSize is equal 2) wider then others, which I want. And adding size=LineSize to aes creates an ugly graph.

As it was suggested in the comments you have to use following code:
theplot=ggplot(data = output.melt, aes(x=month, y=value,colour=variable,group=variable, size= as.numeric(LineSize)))
+geom_line()
+scale_linetype( guide="none")
+ggtitle(as.character("Hello"))
Keep in mind that size of a size = 2 is quite a lot so you would have to adjust your table.

Related

Accamulated data in pivot mode

Now i accamulate columns via row_cumsum
test
| project Boenheter, Ar, Maned, ManedTLA
| extend _date = make_datetime(toint(Ar), Maned, 1)
| extend key1 = Ar, __auto0 = datetime_part('Month', startofmonth(_date))
| summarize value0 = sum(Boenheter) by key1, __auto0, ManedTLA
| order by __auto0 asc, key1 asc
| serialize value0 = **row_cumsum(value0, __auto0 != prev(__auto0))**
| extend __p = pack(tostring(ManedTLA), value0)
| summarize __p = make_bag(__p) by key1
| evaluate bag_unpack(__p)
| order by key1 asc
But i wanna do accamulation for rows in next approach:
Feb = Jan + Feb, Mar = Jan + Feb + Mar, etc... so Feb = 304, Mar = 624 (for 2012 year as example) and so on
Does Kusto have some hack for do accamulation for row instead columns (row_cumsum)?
Help please)
Use row_cumsum, with restart on year change, before using pivot
// Generation of a data sample. No part of the solution.
let t = materialize(range i from 1 to 200 step 1 | extend dt = ago(365d*10*rand()));
// The solution starts here.
t
| summarize count() by year = getyear(dt), month = format_datetime(dt,'MM')
| order by year asc, month asc
| extend cumsum = row_cumsum(count_, year != prev(year))
| evaluate pivot(month, any(cumsum), year)
year
01
02
03
04
05
06
07
08
09
10
11
12
2012
2
4
6
7
10
14
16
2013
2
3
7
8
10
11
15
16
17
18
2014
2
7
11
12
13
14
15
17
19
20
2015
2
3
6
10
11
12
13
14
15
2016
1
2
3
5
6
8
10
11
12
15
16
19
2017
1
2
5
8
13
16
17
20
21
2018
4
5
8
12
15
18
20
23
24
25
26
2019
5
7
8
10
11
14
18
19
20
21
2020
2
5
8
10
11
13
15
16
19
22
2021
2
5
6
7
8
9
11
17
2022
2
4
5
Fiddle

ggplot2 or sjPlot sum stacked barplot columns

I am running R version 3.5.2 (2018-12-20) -- "Eggshell Igloo" on a MacBook Pro, OS 10.14.2.
I have tried a few methods to get these plots. My preferred method was trying to create a stacked barplot of my data (factor grouped over time, my x axis and the counts as the y), with a dichotomous variable count 0,1 counts in each column as they match the counts on the y axis. However I am flexible. I have this code that works if I can overlay a barplot on this that would help.
ggplot(dat, aes(x=factor(yr),y=n, group=(n>0)))+
stat_summary(aes(color=(n>0)),fun.y=length, geom="line")+
scale_color_discrete("Key",labels=c("NN", "N"))+
labs(title= "1992-2018", x="Years",y="n")
using my full dataset, I tried this and got really close to the stacked barplot, it gave me the correct counts per the "yr" variable, however for my variable "n" it gave me a continuous range 0-1.0.
p<-ggplot(data=dat, aes(x=dat$yr, y=n, fill=n)) +
+ geom_bar(stat="identity")
This is the data I am most interested in. I tried to then coerce it into a table then a data frame.
t2<- table(dat$yr, dat$n)
0 1
1992 6 0
1993 10 0
1994 3 1
1995 20 2
1996 15 2
1997 16 0
1998 16 0
1999 9 3
2000 5 0
2001 5 1
2002 7 1
2003 9 2
2004 4 3
2005 6 3
2006 5 3
2007 6 3
2008 4 3
2009 8 4
2010 7 1
2011 4 5
2012 4 5
2013 6 2
2014 0 2
2015 3 3
2016 5 5
2017 4 4
2018 8 5
t<-table(dat$yr)
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
6 10 4 22 17 16 16 12 5 6 8 11 7
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
9 8 9 7 12 8 9 9 8 2 6 10 8
2018
13
I then tried:
df<- data.frame(t, t2)
head(df)
head(df)
Var1 Freq Var1.1. Var2 Freq.1
1 1992 6 1992 0 6
2 1993 10 1993 0 10
3 1994 4 1994 0 3
4 1995 22 1995 0 20
5 1996 17 1996 0 15
6 1997 16 1997 0 16
p<-ggplot(data=df, aes(x=Var1, y=Var2)) +
geom_bar(stat="identity")
p
replacing these for the dataset variables gave me worse results with the y-axis showing no counts per year for "yr" variable and each column was filled all the way to the top of the range of "1".
Again, I would like to get a stacked barplot with the binary "n" in each year column to show the 0/1 sum which should match the 'yr' counts on the y-axis. or, I can use the ggplot I got in the first code I posted and get the sums for each year there, I would take that as well.
this comes really really close. if it also gave a total at the top it would be perfect.
package sjPlot:
sjp.grpfrq(dat$yr, dat$n, bar.pos = c("stack"), show.values = TRUE, show.n = TRUE, show.prc = FALSE, title = NULL)
The major issue with the sjPlot code is I cannot change the legend labels. it shows n= 0, 1. I need to change this to be specific.
Thanks so much in advance!
Try this and see if that's what you want?
ggplot(data=df, aes(x=Var1, y=Freq)) +
geom_bar(stat="identity")
Resolved.
sjp.grpfrq(dat$yr, dat$n, bar.pos = c("stack"), legend.title = "Key", legend.labels = c("NN", "N"), show.values = TRUE, show.n = TRUE, show.prc = FALSE, show.axis.values = TRUE, title = "1992-2018")

Working with dataframes from unique function

I was wondering how I could go about changing some data like this from a dataframe i created:
Variable Freq and Variable Freq
01 3 M 10
02 2
03 4
04 5
to
01 3
02 2
03 4
04 5
M 10
The code i am using to get those 2 tables is :
y = as.data.frame(length(unique(index_visit$PatientID)))
x = as.data.frame(table(index_visit$ProcedureID))

Difference in Timestamp

I want to calculate the difference of two incidents. First five columns indicate a date-time of incident. The rest five columns indicate the date-time of death.
dat <- read.table(header=TRUE, text="
YEAR MONTH DAY HOUR MINUTE D.YEAR D.MONTH D.DAY D.HOUR D.MINUTE
2013 1 6 0 55 2013 1 6 0 56
2013 2 3 21 24 2013 2 4 23 14
2013 1 6 11 45 2013 1 6 12 29
2013 3 6 12 25 2013 3 6 23 55
2013 4 6 18 28 2013 5 3 11 18
2013 4 8 14 31 2013 4 8 14 32")
dat
YEAR MONTH DAY HOUR MINUTE D.YEAR D.MONTH D.DAY D.HOUR D.MINUTE
2013 1 6 1 55 2013 1 6 0 56
2013 2 3 21 24 2013 2 4 23 14
2013 1 6 11 45 2013 1 6 12 29
2013 3 6 12 25 2013 3 6 23 55
2013 4 6 18 28 2013 5 3 11 18
2013 4 8 14 31 2013 4 8 14 32
I want to calculate the difference of time (in minutes). The following code is not going anywhere. The timestamp will look like 2013-04-06 04:08.
library(lubridate)
dat$tstamp1 <- mdy(paste(dat$YEAR, dat$MONTH, dat$DAY, dat$HOUR, dat$MINUTE,sep = "-"))
dat$tstamp2 <- mdy(paste(dat$D.YEAR, dat$D.MONTH, dat$D.DAY, dat$D.HOUR, dat$D.MINUTE, sep = "-"))
dat$diff <- dat$tstamp2 -dat$tstamp2 ### want the difference in minutes
In order to parse a date/time string of the "-"-separated format you're creating, you'll need to give a custom format, and pass it to parse_date_time. For example:
parse_date_time(paste(dat$D.YEAR, dat$D.MONTH, dat$D.DAY, dat$D.HOUR, dat$D.MINUTE, sep = "-"),
"%Y-%m-%d-%H-%M")
Your new code would therefore look like:
library(lubridate)
dat$tstamp1 <- parse_date_time(paste(dat$YEAR, dat$MONTH, dat$DAY, dat$HOUR, dat$MINUTE, sep = "-"),
"%Y-%m-%d-%H-%M")
dat$tstamp2 <- parse_date_time(paste(dat$D.YEAR, dat$D.MONTH, dat$D.DAY, dat$D.HOUR, dat$D.MINUTE, sep = "-"),
"%Y-%m-%d-%H-%M")
Then the following will get you the time difference in minutes:
dat$diff <- as.numeric(dat$tstamp2 - dat$tstamp1)
You can try this:
library(lubridate)
dat$tstamp1 <- strptime(paste(dat$YEAR, dat$MONTH, dat$DAY, dat$HOUR, dat$MINUTE,sep = "-"),"%Y-%m-%d-%H-%M")
dat$tstamp2 <- strptime(paste(dat$D.YEAR, dat$D.MONTH, dat$D.DAY, dat$D.HOUR, dat$D.MINUTE, sep = "-"),"%Y-%m-%d-%H-%M")
dat$diff <- as.POSIXct(dat$tstamp2) - as.POSIXct(dat$tstamp1)
Using strptime is faster and bit safer against unexpected data. You can read more about it here.

How to separate one graph to different new graphs?

I have a database which includes laws per different years from 1985-2012. I would like to make different 17 plots (and thus, to make a function) for each year that will include its' values and the years before, and to keep the same design of x and y axis for each graph, as you can see in the following figure:
That's how I made the graph above, between 1985-2012:
> v <- ddply(leg.by.melt, .(year), summarise, count = sum(value))
> v
year count
1 1985 2
2 1987 5
3 1988 9
4 1989 12
5 1990 14
6 1991 11
7 1992 16
8 1993 23
9 1994 25
10 1995 10
11 1996 11
12 1997 24
13 1998 35
14 1999 32
15 2000 24
16 2001 22
17 2002 65
18 2003 42
19 2004 56
20 2005 42
21 2006 47
22 2007 36
23 2008 16
24 2009 54
25 2011 28
> ggplot(v, aes(x = year, y = count))
+ theme_bw()
+ geom_contour(colour = "black", lty = 3, lend = 2, lwd = 1, stat = "identity")
+ scale_x_continuous(breaks = round(seq(min(v$year), max(v$year), by = 1),1))
+ scale_y_continuous(breaks = round(seq(min(v$count), max(v$count), by = 3),1))
+ theme(axis.text.x = element_text(angle = 0, vjust = 0.2))
As I wrote before, I would like to have a different 17 plots - for 1985, for 1985+1986, for 1985+1986+1987 and so forth, and stil to have the same design of the x and y axis (x axis from 1985:2012 and y axis from 2 to 65).
How can I make a function to achieve it?
if you plot is called p, I would do the following,
plyr::l_ply(v$year, function(.year) p %+% subset(v, year <= .year), .print=TRUE)

Resources