I need to find the average of a set of test scores, but need to drop the lowest - textpad

I am programming fundamentals right now, and have an assignment that asks for the average of assignment grades. I have to drop the lowest grade, but am not sure how to do that part. The most recent thing we've learned so far is a "do loop", but I just don't know how to actually drop the lowest grade while getting the average of the highest 4.

Related

Determining whether values between rows match for certain columns

I am working with some observation data and have run into a bit of an issue beyond my current capabilities. I surveyed different polygons (the column "PolygonID" in the screenshot) for lizards two times during a survey season. I want to determine the total search effort (shown in the column "Effort") for each individual polygon within each survey round. Problem is, the software I was using to collect the data sometimes creates unnecessary repeats for polygons within a survey round. There is an example of this in the screenshot for the rows with PolygonID P3.
Most of the time it does not affect the effort calculations because the start and end time for the rows (the fields used to calculate effort) are the same, and I know how to filter the dataset so it only shows one line per polygon per survey, but I have reason to be concerned there might be some lines where the software glitched and assigned incorrect start and end times for one of the repeat lines. Is there a way I can test whether start and end time match for any such repeats with R, rather than manually going through all the data?
Thank you!

Identifying starting date of a period with low values in a time series in R

I'm quite new to time series and I am wondering what is the best way to identify the starting date of a period with low values of a variable. So in this example I would in a first step want to i) identify whether there is such a period of let's say at least 5 values that are similarly low and ii) what the starting date of this period is.
So in this example (https://i.stack.imgur.com/IxLQg.png) for the first 3 individuals (c15793, c15798 and c3556) I want to figure out that there is such a period and that the starting date is on the 20th of May for c15798 and on the 22nd of May for the other two. But c5157 should be identified as not having such a period.
I have no clue on how I could identify such a period and I was hoping someone would have an idea and point me to a method or a point where I could start. Everything I can think of would require some sort of threshold (e.g. the difference between consecutive measurements) which I don't know how to choose. So if anyone has a more elegant idea or a good idea on how to set a threshold, I would be more than happy to learn about it.
Thanks so much in advance!
enter image description here

Take profit function of a timeseries

I am having great difficulty with this topic and I could use assistance from some experts.
I have a standard time series, I can have it as an XTS or as a dataframe with numeric inputs. The headers are typical: DATE, OPEN, HIGH, LOW, CLOSE, SMA20, SMA50, CROSSOVER.
SMA20 refers to the 20 periods simple moving average. Crossover refers to where the 20 period SMA is above or below the SMA50. If its positive it is above, if it is negative it is below.
Here is my problem that I am trying to solve. How do I create a separate column to track profits or losses??? I want to create a take profit function at 100 pips and a stop loss function at 100 pips away from the entry point. The entry point will always be the open of the next day, once the crossover happens.
What I am thinking so far is to use the open price, then look at the difference of the high and the low for the day, if difference between the high and the open is greater than 100, the trade gets closed, if it is less, the trades stays open. I have no idea how to begin coding this.

Which format is tidy?

Let's say I have a dataset of ACT test scores. Each "observation" is a student's results from taking the ACT. The ACT has five subjects: reading, English, math, science, and writing (plus a composite score). Each test subject has a scale score, a national percentile rank, and a college readiness indicator (Y or N).
My question is (and always seems to be since I work a lot with assessment data), which format is "tidy"?
where each row is a unique student test + subject combination with a subject column and then scaleScore, percentile, and readiness columns for each value.
where each row is a unique student test with all the subjects and their respective values listed out in separate columns.
Or where I have something like the first option but put into six tables one for each subject with a key to join on?
I've been working in SQL + Excel for a while, but I want to expand my EDA skills in R. Any help would be much appreciated! The key focus is on subsequent visualization with ggplot. I'm guessing the answer may just be "it depends" with a willingness to gather and spread for different plotting purposes.
Columns being student, test, subject, scaleScore, percentile, readiness.
Student and test variables would identify each observation.
Subject is a variable. Reading, English, math, etc. are values of the subject variable. This is essentially the heart of the tidy approach, which tends to be deep, not wide, and lends itself to joining, grouping, plotting, and so forth.
OR to make it really tidy, score and scoreType are variables, and their respective values are included as observations.
Either way, in one table the student and test would be repeated on multiple rows. But this serves to illustrate the tidy perspective. Clearly, normalized tables are a worthy consideration, in terms of the big picture.

Create column in matrix that is a ratio of the values of two other columns in Power BI

I have a matrix that has the total premiums for each year, and total commissions for each year.
I would like a third column that shows what the ratio for total commissions to total premiums is for each year. I thought this would be easy but the values I get aren't right at all. Here's the formula I tried:
Current formula
And here's how the column, Gross Commission %, came out:
Current table
Obviously this is all wrong. I've tried using DIVIDE([Gross Commission (+ve)],[Gross Written Premium]) instead but that didn't work either.
Obviously I've scrambled all of the premium and commission data before posting here so what you see is not real-world data, but it's exactly the same for the real data.
I would use the same formula, but created as a Measure, not a Column. So from PBI Desktop's Modeling ribbon, click New Measure.
Then the calculation will run after the default Sum calculation has occurred on the other columns, and will obey the context of the Year.

Resources