Working with dataframes from unique function - r

I was wondering how I could go about changing some data like this from a dataframe i created:
Variable Freq and Variable Freq
01 3 M 10
02 2
03 4
04 5
to
01 3
02 2
03 4
04 5
M 10
The code i am using to get those 2 tables is :
y = as.data.frame(length(unique(index_visit$PatientID)))
x = as.data.frame(table(index_visit$ProcedureID))

Related

Accamulated data in pivot mode

Now i accamulate columns via row_cumsum
test
| project Boenheter, Ar, Maned, ManedTLA
| extend _date = make_datetime(toint(Ar), Maned, 1)
| extend key1 = Ar, __auto0 = datetime_part('Month', startofmonth(_date))
| summarize value0 = sum(Boenheter) by key1, __auto0, ManedTLA
| order by __auto0 asc, key1 asc
| serialize value0 = **row_cumsum(value0, __auto0 != prev(__auto0))**
| extend __p = pack(tostring(ManedTLA), value0)
| summarize __p = make_bag(__p) by key1
| evaluate bag_unpack(__p)
| order by key1 asc
But i wanna do accamulation for rows in next approach:
Feb = Jan + Feb, Mar = Jan + Feb + Mar, etc... so Feb = 304, Mar = 624 (for 2012 year as example) and so on
Does Kusto have some hack for do accamulation for row instead columns (row_cumsum)?
Help please)
Use row_cumsum, with restart on year change, before using pivot
// Generation of a data sample. No part of the solution.
let t = materialize(range i from 1 to 200 step 1 | extend dt = ago(365d*10*rand()));
// The solution starts here.
t
| summarize count() by year = getyear(dt), month = format_datetime(dt,'MM')
| order by year asc, month asc
| extend cumsum = row_cumsum(count_, year != prev(year))
| evaluate pivot(month, any(cumsum), year)
year
01
02
03
04
05
06
07
08
09
10
11
12
2012
2
4
6
7
10
14
16
2013
2
3
7
8
10
11
15
16
17
18
2014
2
7
11
12
13
14
15
17
19
20
2015
2
3
6
10
11
12
13
14
15
2016
1
2
3
5
6
8
10
11
12
15
16
19
2017
1
2
5
8
13
16
17
20
21
2018
4
5
8
12
15
18
20
23
24
25
26
2019
5
7
8
10
11
14
18
19
20
21
2020
2
5
8
10
11
13
15
16
19
22
2021
2
5
6
7
8
9
11
17
2022
2
4
5
Fiddle

Truncating a dataframe according to count of vector elements

I have a dataframe df, containing three vectors:
subject condition value
01 A 12
01 A 6
01 B 10
01 B 2
02 A 5
02 A 11
02 B 3
02 B 5
02 B 9
...
There are four observations (and hence four rows) for subject 01, with two observations corresponding to condition A and two corresponding to condition B. Let's say that due to a technical error, there are three condition B observations for subject 02.
My question is this: how can I truncate df to ensure that each condition only has two observations for each individual subject (hence removing the erroneous third row where condition==B for subject 02)?
Thanks in advance for any assistance!
Here's a dplyr solution -
df %>%
group_by(subject, condition) %>%
filter(row_number() < 3) %>%
ungroup()
# A tibble: 8 x 3
subject condition value
<chr> <chr> <dbl>
1 01 A 12
2 01 A 6
3 01 B 10
4 01 B 2
5 02 A 5
6 02 A 11
7 02 B 3
8 02 B 5
For each subject/condition pair create a sequence number seq for its rows and then only keep those rows whose sequence number is less than 3.
subset(transform(DF, seq = ave(value, subject, condition, FUN = seq_along)), seq < 3)
giving:
subject condition value seq
1 01 A 12 1
2 01 A 6 2
3 01 B 10 1
4 01 B 2 2
5 02 A 5 1
6 02 A 11 2
7 02 B 3 1
8 02 B 5 2
Note
The input in reprodudible form is assumed to be:
Lines <- "subject condition value
01 A 12
01 A 6
01 B 10
01 B 2
02 A 5
02 A 11
02 B 3
02 B 5
02 B 9"
DF <- read.table(text = Lines, header = TRUE, strip.white = TRUE,
colClasses = c("character", "character", "numeric"))

How to make a single line in different size

I have this sample data:
head(output.melt,10)
month variable value LineSize
1 01 1997 100.00000 1
2 02 1997 91.84783 1
3 03 1998 92.67626 1
4 04 1998 105.70113 1
5 05 1998 115.12516 1
6 06 1998 118.95298 1
7 07 1999 117.99673 1
8 08 1999 125.50852 1
9 09 1999 119.39502 1
10 10 1999 100.79032 1
11 03 Mean 103.17473 2
12 04 Mean 108.12440 2
13 05 Mean 109.54016 2
14 06 Mean 107.71431 2
15 07 Mean 107.86694 2
16 08 Mean 108.32371 2
17 09 Mean 102.06684 2
18 10 Mean 99.96975 2
19 11 Mean 111.94529 2
20 12 Mean 113.49491 2
I want to make a plot where one line has different linetype and size. I get the different linetype but not size:
theplot=ggplot(data = output.melt, aes(x=month, y=value,colour=variable,group=variable,linetype = LineSize))
+geom_line()
+scale_linetype( guide="none")
+ggtitle(as.character("Hello"))+theme_economist()
But the code above does not make the line (where LineSize is equal 2) wider then others, which I want. And adding size=LineSize to aes creates an ugly graph.
As it was suggested in the comments you have to use following code:
theplot=ggplot(data = output.melt, aes(x=month, y=value,colour=variable,group=variable, size= as.numeric(LineSize)))
+geom_line()
+scale_linetype( guide="none")
+ggtitle(as.character("Hello"))
Keep in mind that size of a size = 2 is quite a lot so you would have to adjust your table.

Add a column that sum the number of sessions per user in R [duplicate]

This question already has answers here:
Add count of unique / distinct values by group to the original data
(3 answers)
Closed 6 years ago.
I starting to data-mine a mobile application,
and I have a database that looks like this:
Database
UserId Hour Date
01 18 01.01.2016
01 18 01.01.2016
01 14 02.01.2016
01 14 03.01.2016
02 21 03.01.2016
02 08 05.01.2016
02 08 05.01.2016
03 23 05.01.2016
I would like to add a new column to this database that sums the number of different days the user has been using the application,
In this database for example UserId#01 has been on the platform in three different days,
Expected data outcomes like this:
Database
UserId Hour Date NumDates
01 18 01.01.2016 3
01 18 01.01.2016 3
01 14 02.01.2016 3
01 14 03.01.2016 3
02 21 03.01.2016 2
02 08 05.01.2016 2
02 08 05.01.2016 2
03 23 05.01.2016 1
So far I have used this command:
Database["NumDates"] % group_by(UserId) %>% summarise(NumDates = length(unique(Date)))
But it tells me that that it is creating only 5000 lines (the number of different users in my database) when I need +600,000 (the number of sessions in my database)
If somebody could help me with this, it will be greatly appreciated!
We can use uniqueN from data.table
library(data.table)
setDT(Database)[, NumDates := uniqueN(Date) , by = UserId]
Database
# UserId Hour Date NumDates
#1: 1 18 01.01.2016 3
#2: 1 18 01.01.2016 3
#3: 1 14 02.01.2016 3
#4: 1 14 03.01.2016 3
#5: 2 21 03.01.2016 2
#6: 2 8 05.01.2016 2
#7: 2 8 05.01.2016 2
#8: 3 23 05.01.2016 1
You don't want summarise here but mutate. summarise will give you one row by distinct value of the column you grouped by, while mutate will just add another column and preserving existing ones.
you could use n_distict in dplyr
library("dplyr")
database<- data.frame(UserId = c(1,1,1,1,2,2,2,3), Hour = c(18,18,14,14,21,8,8,23), Date = c("01.01.2016","01.01.2016","02.01.2016","03.01.2016","03.01.2016","05.01.2016","05.01.2016","05.01.2016"))
database %>% group_by(userId) %>% mutate(NumDates = n_distinct(Date))
the result is as follows
UserId Hour Date NumDates
(dbl) (dbl) (fctr) (int)
1 1 18 01.01.2016 3
2 1 18 01.01.2016 3
3 1 14 02.01.2016 3
4 1 14 03.01.2016 3
5 2 21 03.01.2016 2
6 2 8 05.01.2016 2
7 2 8 05.01.2016 2
8 3 23 05.01.2016 1

combine similar consecutive observations into one observation in R

I have a data set like this
date ID key value
05 1 3 2
05 1 3 5
05 1 3 1
05 1 5 2
05 1 7 3
05 1 7 3
05 1 3 4
05 2 9 8
I need the output to look like this
date ID key value
05 1 3 8
05 1 5 2
05 1 7 6
05 1 3 4
05 2 9 8
so as you see if consecutive date, ID, and key are the same , I want to know how to combine these observation and add their value. I need this to happen only if the events where consecutive.
is it possible to do it r?
if yes, can anyone please tell me how to do it?
thanks
Use rle to look for consecutive sequences
# your data
df <- read.table(text="date ID key value
05 1 3 2
05 1 3 5
05 1 3 1
05 1 5 2
05 1 7 3
05 1 7 3
05 1 3 4
05 2 9 8", header=T)
# get consecutive values - add a grouping variables
r <- with(df, rle(paste(date, ID, key)))
df$grps <- rep(seq(r$lengths), r$lengths)
# aggregate values
a <- aggregate(value ~ date + ID + key + grps, data = df , sum)
# remove the grouping variable
a$grps <- NULL

Resources