powerpivot 2010 pivot table new measure for percentage difference - percentage

I'm a newbie with PowerPivot 2010. I've created a few reports, but I'm having difficulty with a report that I want to create that requires a new measure. In my volume table I have 2 columns in my volume table (tblPilot_Volume) - Current Wk Vol This and Current Wk Vol Last. I want to display in the pivot the percentage difference between these two values.
I tried the calculation below but it doesn't work. I'm not sure what I'm doing wrong.
=sum(tblPilot_Volume[Current Wk Vol This]-(tblPilot_Volume[Current Wk Vol Last])/(tblPilot_Volume[Current Wk Vol Last])
Prior to this I tried to just add a column to the PowerPivot table with the formula. The calculations works in the table, but when I create the pivot table it doesn't work.
I also need to eliminate any rows where there is a zero in the Current wk Vol Last column.

ME Miller, I would try this:
=(sum(tblPilot_Volume[Current Wk Vol This])-sum(tblPilot_Volume[Current Wk Vol Last])) / sum(tblPilot_Volume[Current Wk Vol Last])
When creating measures, make sure you always use one of the aggregation functions (like sum, avg, min, max etc.).
Hope this helps.

Related

Pyspark GroupBy time span

I have data with a start and end date e.g.
+---+----------+------------+
| id| start| end|
+---+----------+------------+
| 1|2021-05-01| 2022-02-01|
| 2|2021-10-01| 2021-12-01|
| 3|2021-11-01| 2022-01-01|
| 4|2021-06-01| 2021-10-01|
| 5|2022-01-01| 2022-02-01|
| 6|2021-08-01| 2021-12-01|
+---+----------+------------+
I want a count for each month on how many observations were "active" in order to display that in a plot. With active I mean I want a count on how many observations have a start and end date that includes the given month. The result for the example data should look like this:
Example of a plot for the active times
I have looked into the pyspark Window function, but I don't think that can help me with my problem. So far my only idea is to specify an extra column for each month in the data and indicate whether the observation is active in that month and work from there. But I feel like there must be a much more efficient way to do this.
You can use sequence SQL. sequence will create the date range with start, end and interval and return the list.
Then, you can use explode to flatten the list and then count.
from pyspark.sql import functions as F
# Make sure your spark session is set to UTC.
# This SQL won't work well with a month interval if timezone is set to a place that has a daylight saving.
spark = (SparkSession
.builder
.config('spark.sql.session.timeZone', 'UTC')
... # other config
.getOrCreate())
df = (df.withColumn('range', F.expr('sequence(to_date(`start`), to_date(`end`), interval 1 month) as date'))
.withColumn('observation', F.explode('range')))
df = df.groupby('observation').count()

Column operators regarding only specific columns (specific dates and code i.e.) in R

i am trying to calculate the average_relative_humidity of the city Seoul for the dates 2020-01-01 tll 2020-31-01.
I have this data:
and I´ve tried this already but don´t really know what´missing.
Seoul_weather_dt <- Corona_relevant_weather_dt[, avg_relative_humidity_seoul := mean(avg_relative_humidity[code =="2020-01-01":"2020-01-01"]), by = c("province", "date", "avg_temp", "avg_relative_humidity"]
Can someone help me?
Something like this?
#select only Seoul and relevant dates
Seoul_weather_dt <- Corona_relevant_weather_dt[province == "Seoul" & date > as.date("2020-01-01") & date <= as.date("2020-31-01")]
#calculate average humidity for each unique date
aggregate(Seoul_weather_dt$avg_relative_humidity, by = list(Seoul_weather_dt$date), FUN = mean)
The line of code you provide is pretty long. I would suggest creating multiple lines with less functions per line to maintain an overview (also easier when getting an error). Also
is datein class "Date"? You can see that using str(Seoul_weather_dt)
code =="2020-01-01":"2020-01-01" only selects one day
Using by = c("province", "date", "avg_temp", "avg_relative_humidity") is strange. Then you would calculate a mean value for each observation of avg_relative_humidity as well, which is not what you want
Why create average values for each province when you are only interested in Seoul?

How do I stop the number of observations coming up when trying to tabulate a variable?

Very new to using R but encountering a problem when trying to work on the code for a stats project. I have attached the .csv file below for reference but essentially I would like to plot the years 2018,2019 and 2020 against the sum of international arrivals ("Int_Pax_In" in the excel file) from the first 6 months of each year from the "All Australian Airports" variable . So I will have 3 bars in my plot, with each being 2018,2019,2020 respectively with the y-axis labelled "All Australian Arrivals". The problem is, I just wanted to start off with a simple line of code to tabulate the "Year" variable without even trying to achieve the final result and simply putting in:
info=read.csv("mon_pax_web.csv")
table(info$Year)
doesn't give me any information. It simply gives me the number of observations for each year instead of anything else. Below is a screenshot of what I get:
Screenshot 1
info=read.csv("mon_pax_web.csv")
str(info)
table(info$Year)
I also tried changing my variables apart from "Year" into as.character and Month into factor but that had no effect as shown below:
Screenshot 2
info=read.csv("mon_pax_web.csv")
info$AIRPORT=as.character(info$AIRPORT)
info$Month=as.factor(info$Month)
info$Dom_Pax_In=as.character(info$Dom_Pax_In)
info$Dom_Pax_Out=as.character(info$Dom_Pax_Out)
info$Dom_Pax_Total=as.character(info$Dom_Pax_Total)
info$Int_Pax_Out=as.character(info$Int_Pax_Out)
info$Int_Pax_Total=as.character(info$Int_Pax_Total)
info$Pax_In=as.character(info$Pax_In)
info$Pax_Out=as.character(info$Pax_Out)
info$Pax_Total=as.character(info$Pax_Total)
info$Int_Pax_In=as.character(info$Int_Pax_In)
str(info)
table(info$Year)
I'm only allowed to use Base R for this project so would appreciate it a lot if people could help me out and if you do, provide coding using Base R so I could follow along. Just require some pointers so I could get started.
CSV File for reference
Thank you.
The column info$Year is just a vector of years, so when you do table(info$Year) it only shows the number of entries for that year because that's what you have asked for. If I gave you the following years: 2011, 2011, 2012 and 2013, and asked you to tabulate the years, without giving you any other information, all you could do is count the number of instances of each year. Presumably, this is not what meant.
I'm guessing what you're trying to do is to get the sum of Int_Pax_In per year. First you should filter so that your only include the years of interest, the months of interest, and the rows that represent all Australian airports. You can do this using subset:
df <- subset(info, Year > 2017 & Month < 7 & AIRPORT == "All Australian Airports")
Now we can use tapply to find the sum for each year:
plot_table <- tapply(df$Int_Pax_In, df$Year, sum)
Finally, we use barplot to create the bar graph you wanted:
barplot(plot_table, main = "Arrivals at all Australian airports January - June")

Apache Drill: Group by week

I tried to group my daily data by week (given a reference date) to generate a smaller panel data set.
I used postgres before and there it was quite easy:
CREATE TABLE videos_weekly AS SELECT channel_id,
CEIL(DATE_PART('day', observation_date - '2016-02-10')/7) AS week
FROM videos GROUP BY channel_id, week;
But it seems like it is not possible to subtract a timestamp with a date string in Drill. I found the AGE function, which returns an interval between two dates, but how to convert this into an integer (number of days or weeks)?
DATE_SUB may help you here. Following is an example:
SELECT extract(day from date_sub('2016-11-13', cast('2015-01-01' as timestamp)))/7 FROM (VALUES(1));
This will return number of weeks between 2015-01-01 and 2016-11-13.
Click here for documentation

Count between months in Tableau

I am needing to count month between collect dates. I need to know if the test was run in the last 3 months. Below is the code I used but it is giving me a count of zero, but I know they had 3 of the same tests run in a year because I can see the dates. I understand the first one have a count of zero, because there is no test before that, but the count for the other should be 3, 5 respectively.
DATEDIFF('month',[Collect Date],[Collect Date])
Dates of the Tests.
1/8/2015
4/23/2015
9/30/2015
What you are looking for is possible using the LOOKUP function in Tableau. Keep in mind, that the result relies heavily on the data that is displayed and how it is displayed (sorted, etc).
You can create a calculated field like this:
DATEDIFF("month",LOOKUP(ATTR([Test Date]),-1),ATTR([Test Date]))
Which calculates the number of months between the date in the current row and the date from the prior row.
Your result will look something like this:

Resources