I'm trying to write a Crystal Report which has totals grouped in a different way to the main report. The only way I've been able to do this so far is to use a subreport for the totals, but it means having to hit the data source again to retrieve the same data, which seems like nonsense. Here's a simplified example:
date name earnings source location
-----------------------------------------------------------
12-AUG-2008 Tom $50.00 washing cars uptown
12-AUG-2008 Dick $100.00 washing cars downtown { main report }
12-AUG-2008 Harry $75.00 mowing lawns around town
total earnings for washing cars: $150.00 { subreport }
total earnings for mowing lawns: $75.00
date name earnings source location
-----------------------------------------------------------
13-AUG-2008 John $95.00 dog walking downtown
13-AUG-2008 Jane $105.00 washing cars around town { main report }
13-AUG-2008 Dave $65.00 mowing lawns around town
total earnings for dog walking: $95.00
total earnings for washing cars: $105.00 { subreport }
total earnings for mowing lawns: $65.00
In this example, the main report is grouped by 'date', but the totals are grouped additionally by 'source'. I've looked up examples of using running totals, but they don't really do what I need. Isn't there some way of storing the result set and having both the main report and the subreport reference the same data?
Hmm... as nice as it is to call the stored proc from the report and have it all contained in one location, however we found (like you) that you eventually hit a point where you can't get crystal to do what you want even tho the data is right there.
We ended up introducing a business layer which sits under the report and rather than "pulling" data from the report we "push" the datasets to it and bind the data to the report. The advantage is that you can manipulate the data in code in datasets or objects before it reaches the report and then simply bind the data to the report.
This article has a nice intro on how to setup pushing data to the reports. I understand that your time/business constraints may not allow you to do this, but if it's at all possible, I'd highly recommend it as it's meant we can remove all "coding" out of our reports and into managed code which is always a good thing.
The only way I can think of doing this without a second run through the data would be by creating some formulas to do running totals per group. The problem I assume you are running into with the existing running totals is that they are intended to follow each of the groups that they are totaling. Since you seem to want the subtotals to follow after all of the 'raw' data this won't work.
If you create your own formulas for each group that simply adds on the total from those rows matching the group you should be able to place them at the end of the report. The downside to this approach is that the resulting subtotals will not be dynamic in relationship to the groups. In other words if you had a new 'source' it would not show up in the subtotals until you added it or if you had no 'dog walking' data you would still have a subtotal for it.
Related
I am very new to R programming, and I have a few datasets that I'm playing around with. One of the things I'm trying to do is use ggplot to graph what percentage of the population in each state voted in the 2016 election.
The first csv file I have contains an estimate of the population of each state in 2016, and the second csv file I have contains the number of votes cast by each party in the 2016 election. I'm not sure how to attach the file here, so I will show some screenshots:
2016 Election Votes:
2016 State Populations:
From what I understand, I can read the 2016 election votes csv file, and create a new column that contains the total votes using something like:
electionVotes$TotalVotes <- electionVotes$DemocraticCandidates + electionVotes$RepublicanCandidates + electionVotes$OtherCandidates
Once I have that, I would like to create a column where I do something like:
electionVotes$PercentVoted <- electionVotes$TotalVotes / *number of people per state*
I understand how to use ggplot to display the results, but what is confusing to me is how I can accurately use these tables with each other when one State column uses an abbreviation for the state name, like "AL", while the other one uses ".Alabama".
Any thoughts on what would be the best process to do this other than manually editing the csv file? Thank you!
You could bring in a table (http://app02.clerk.org/menu/ccis/Help/CCIS%20Codes/state_codes.html) that links the abbreviations to the full state names and use that to join the two datasets.
I am trying to create a simple revenue per person calc that works with different filters within the data. I have it working for a single record, however, it breaks and aggregates incorrectly with multiple records.
The formula I have now is simply Sum([Revenue]) / Sum([Attendance]). This works when I only have a single event selected. However, as soon as I select multiple shows it aggregates and doesn't do the weighted avg.
I'm making some assumptions here, but hopefully this will help you out. I've created an .xlsx file with the following data:
Event Revenue Attendance
Event 1 63761 6685
Event 2 24065 3613
Event 3 69325 4635
Event 4 41996 5414
Inside Tableu I've created the calculated column for Rev Per Person.
Finally, in the Analysis dropdown I've enabled Show Column Grand Totals. This gives me the following:
Simple Fix
The problem is that all of the column totals are being calculated using the SUM aggregation. This is the desired behavior for Revenue and Attendance, but for Rev Per Person, you want to display the average.
In Analysis/ Totals / Total All Using you can configure the default aggregation. Here we don't want to set all of them though; but it's useful to know. Leave that where it is, and instead click on the Rev Per Person Grand Total value and change it from 'Automatic' to 'Average'.
Now you'll see a number much closer to the expected.
But it's not exactly what you expect. The average of all the Rev Per Person values gives us $9.73; but if you take the total Revenue / total Attendance you'd expect a value of $9.79.
Slight More Involved Fix
First - undo the simple fix. We'll keep all of the totals at 'Default'. Instead, we'll modify the Rev Per Person calculation.
IF Size() > 1 THEN
// Grand Total
SUM([Revenue]/[Attendance])
ELSE
// Regular View
SUM([Revenue])/SUM([Attendance])
END
Size() is being used to determine if the calculation is being done for an individual cell or not.
More information on Size() and similar functions can be found on Tableau's website here - https://onlinehelp.tableau.com/current/pro/desktop/en-us/functions_functions_tablecalculation.html
Now I see the expected value of $9.79.
So, sometimes I need to get some data from the web organizing it into a dataframe and waste a lot of time doing it manually. I've been trying to figure out how to optimize this proccess, and I've tried with some R scraping approaches, but couldn't get to do it right and I thought there could be an easier way to do this, can anyone help me out with this?
Fictional exercise:
Here's a webpage with countries listed by continents: https://simple.wikipedia.org/wiki/List_of_countries_by_continents
Each country name is also a link that leads to another webpage (specific of each country, e.g. https://simple.wikipedia.org/wiki/Angola).
I would like as a final result to get a data frame with number of observations (rows) = number of countries listed and 4 variables (colums) as ID=Country Name, Continent=Continent it belongs to, Language=Official language (from the specific webpage of the Countries) and Population = most recent population count (from the specific webpage of the Countries).
Which steps should I follow in R in order to be able to reach to the final data frame?
This will probably get you most of the way. You'll want to play around with the different nodes and probably do some string manipulation (clean up) after you download what you need.
I am currently attempting to implement a trading idea that I have been playing around with. It consists of 50+ securities and has a strategy very similar to this one. (Current package I am using is quantmod).
http://www.r-bloggers.com/backtesting-a-simple-stock-trading-strategy/
For those who aren't interested in clicking, it is a strategy that will look at the pass X days( in his case 200 ) and enter a position depending on the peak reached in the stock. I understand how to do this strategy for my idea, but I cannot grasp how to aggregate my data into one summary.
Is there a way I can consolidate the summary for all the positions I have entered into one larger portfolio summary and chart that against the S&P 500?
Any advice on where I can find resources or being lead to the information. I have looked at portfolio analysis package for R and I do not believe that will be much help to me.
Thank you in advance.
Edit: In the link, at the bottom, there are 3 indexes that are FTSE, N225, DJIA. Could i combine those 3 summaries to show the same output as below, BUT combined
FTSE:
Me Index
Cumulative Return 3.56248582 3.8404476
Annual Return 0.05667121 0.0589431
Annualized Sharpe Ratio 0.45907768 0.3298633
Win % 0.53216374 0.5239884
Annualized Volatility 0.12344579 0.1786895
Maximum Drawdown -0.39653398 -0.5256991
Max Length Drawdown 1633.00000 2960.0000
Could I get that same output but for the 3 securities data combined? Is there a effective way of doing that. Thank you so much. Happy holidays
It's a little unclear to me what you mean by "combine" in this case. If you want a single column representing the combined returns from all three exchanges as if they were a single unified market, that's really tricky, because the exchanges trade in different currencies (British pounds; U.S. dollars, Japanese Yen, etc.). The underlying analysis would have to be modified substantially to take into account fluctuating daily foreign exchange rates.
I suspect that this is NOT want you want. Rather, you are simply asking how to take three sequential two-column outputs and turn them into a single parallel six-column output.
If that is indeed what you want, then you need to rewrite the testStrategy() function shown near the bottom of the link. As it's currently written, that function takes three inputs: an index name myStock (with allowed values of FTSE, DJIA, or N225), and two integer values, nHold and nHigh. You would need to change it so that it instead accepts five inputs; e.g., myStockA, myStockB and myStockC, plus the two integer values already mentioned. Then each of the lines currently referring to myStock would have to be replicated three times. Finally, the two cbind() lines that you see at the bottom would have to be modified so that instead of merging the data together into only two columns, you include all six.
For a good intro tutorial on how to write and modify your own R functions, please see this. To understand how to use the cbind() function, which you will have to call with six rather than two inputs, please see this.
I'm trying to develop a ChartCustomizer that takes the data from a chart and converts it into a histogram (because JR does not directly support histograms). It's a fairly simple implementation with hard-coded intervals, etc. mostly as a proof-of-concept at this point.
The data I'm analyzing is HTTP response-time data of the form [date, response-time] and I have a CSV file with 18512 records in it. In my summary band, I have 3 items:
A text field dumping $V{REPORT_COUNT} (it reports 18512 in iReport's report preview)
A time series showing all the data points [date, response-time]
A category plot containing all the data points in a single series [category=$F{DATE}, value=$F{RESPONSE_TIME}]
I decided that the most straightforward way to build a histogram would be to use the Category plot because it had the right structure for the final histogram chart.
When the ChartCustomizer runs, it dumps out all kinds of good information about the data set, including the size. Strangely, the size is 10252: it's missing something like 8000 data points. I can't understand why the category plot would have fewer data points than the whole data set.
Any ideas?
Answering my own question in case others run across this foolish user error.
The problem was that CategoryDataset only allows one data point per "category", and in my case, "category" was a java.util.Date captured from the web server log. Apparently, nearly half of my dates were duplicates and so part of the data set overwrote the other half, leaving a subset of the data.
That should have been totally obvious to me at the outset, because that is exactly how a category dataset works.
Anyhow, simply changing the category plot series's "category expression" from $F{DATE} to $V{REPORT_COUNT} gave each datum a unique category which makes everything work.