Count repeated values in a column - count

I have a column of year values by which I am sorting. I'd like to find the quantity per year (read: number of repeats of each year value). I'd like to chart said values. I'm not sure how to make this happen.
I am using Apple's Numbers '08, but if possible a general solution that multiple people could use would be preferred.

You should use the countif() function: http://office.microsoft.com/en-us/excel/HP052090291033.aspx
I did a similar thing to count how many hours of work there are for each upcoming version of my iPhone app. I was doing sumif(), but you just want countif().
See cells N4-N6 here: http://spreadsheets.google.com/ccc?key=0AhL0igVI9HVNdGpaS3U1cS1qOGVNd3h0Slg0a21vUWc&hl=en

On a new sheet, list the unique years in one column, then their quantity count in the column next to them. Select the entire range created, then create a chart.
I'm unsure from your question what you would specifically need more than this (and I work in Excel 2003).

Related

Grouping and transposing data in R

It is hard to explain this without just showing what I have, where I am, and what I need in terms of data structure:
What structure I had:
Where I have got to with my transformation efforts:
What I need to end up with:
Notes:
I've not given actual names for anything as the data is classed as sensitive, but:
Metrics are things that can be measured- for example, the number of permanent or full-time jobs. The number of metrics is larger than presented in the test data (and the example structure above).
Each metric has many years of data (whilst trying to do the code I have restricted myself to just 3 years. The illustration of the structure is based on this test). The number of years captured will change overtime- generally it will increase.
The number of policies will fluctuate, I've just labelled them policy 1, 2 etc for sensitivity reasons and limited the number whilst testing the code. Again, I have limited the number to make it easier to check the outputs.
The source data comes from a workbook of surveys with a tab for each policy. The initial import creates a list of tibbles consisting of a row for each metric, and 4 columns (the metric names, the values for 2024, the values for 2030, and the values for 2035). I converted this to a dataframe, created a vector to be a column header and used cbind() to put this on top to get the "What structure I had" data.
To get to the "Where I have got to with my transformation efforts" version of the table, I removed all the metric columns, created another vector of metrics and used rbind() to put this as the first column.
The idea in my head was to group the data by policy to get a vector for each metric, then transpose this so that the metric became the column, and the grouped data would become the row. Then expand the data to get the metrics repeated for each year. A friend of mine who does coding (but has never used R) has suggested using loops might be a better way forward. Again, I am not sure of the best approach so welcome advice. On Reddit someone suggested using pivot_wider/pivot_longer but this appears to be a summarise tool and I am not trying to summarise the data rather transform its structure.
Any suggestions on approaches or possible tools/functions to use would be gratefully received. I am learning R whilst trying to pull this data together to create a database that can be used for analysis, so, if my approach sounds weird, feel free to suggest alternatives. Thanks

Subsetting rows, changing values, and placing them back into matrix?

I hope this has not been answered, but when I search for a solution to my problem I am not getting any results.
I have a data.frame of 2000+ observations and 20+ columns. Each row represents a different observation and each column represents a different facet of data for that observation. My objective is to iterate through the data.frames and select observations which match criteria (eg. I am trying to pick out observations that are in certain states). After this, I need to subtract or add time to convert it to its appropriate time zone (all of the times are in CST). What I have so far is an exorbitant amount of subsetting commands that pick out the rows that are of the state being checked against. When I try to write a for loop I can only get one value returned, not the whole row.
I was wondering if anyone had any suggestions or knew of any functions that could help me. I've tried just about everything, but I really don't want to have to go through each state of observations and modify the time. I would prefer a loop that could easily go through the data, select rows based on their state, subtract or add time, and then place the row back into its original data.frame (replacing the old value).
I appreciate any help.

Google Spreadsheet IF and AND

im trying to find an easy formula to do the following:
=IF(AND(H6="OK";H7="OK";H8="OK";H9="OK";H10="OK";H11="OK";);"OK";"X")
This actually works. But I want to apply to a range of cells within a column (H6:H11) instead of having to create a rule for each and every one of them... But trying as a range:
=IF(AND(H6:H11="OK";);"OK";"X")
Does not work.
Any insights?
Thanks.
=ArrayFormula(IF(AND(H6:H11="OK");"OK";"X"))
also works
arrayformulas work the same way they do in excel... they just need an ArrayFormula() around to work (will be automatically set when pressing Ctrl+Alt+Return like in excel)
In google sheets the formula is:
=ArrayFormula(IF(SUM(IF(H6:H11="OK";1;0))=6;"OK";"X"))
in excel:
=IF(SUM(IF(H6:H11="OK";1;0))=6;"OK";"X")
And confirm with Ctrl-Shift-Enter
This basically counts the number of times the said range is = to the criteria and compares it to the number it should be. So if the range is increased then increase the number 6 to accommodate.

BIRT Designer: Determining Percentage of Total for Values in a Column

I have a data set in BIRT Designer with two columns, one with day of week abbreviation names (Su, M, Tu, etc.) and the other with numerical representations of those days of the week starting at 0 and going to 6 (0, 1, 2, etc.). I want to determine what percentage of the total number of rows that each day of week represents. For example, if I have 100 total rows and 12 of those rows correspond to Su/0, 12% of the total rows are made up of Su.
I would like to perform this same calculation within BIRT and graph (bar graph) those percentages that each day consists of out of the total. I'm just learning how to use BIRT and assume that I need to do some scripting either when making my data set or when specifying the rows when making the chart. Any tips would be greatly appreciated.
Use computed columns.
Edit Data set > Computed Columns
The simplest way is to put one column that counts every row, for each day of the week. You can have a separate column that adds a count if the day of the week is a specific values
if (row["Day"] == "Su"){
1
}
I should add: that you can use a 'data' element in your table to compute the percentage. A 'Dynamic Text' item could also be used, but the data item gives you a binded value that you can make better use of later if needed.
Edit
To get a total row count, us a computed column I name mine 'All'
For the Expression use the value "1"
With some inspiration from James Jenkins I think I found my answer. It was pretty simple in the end, but all I needed to do was make a new computed column and instead of adding an expression, I simply set the Aggregation to "COUNT". That counts all of the rows in your table and puts that total on each row. That way you can use that total in any calculations that you may need to do. I have added a screenshot for clarity.

Column means over a finite range of rows

I am working with climate data in New Mexico and I am an R novice. I am trying to replace NA with means but there are 37 different sites in my df. I want the means of the column for which the DF$STATION.NAME (in column 1) is unique. I cant be using data from one location to find the mean of another... obviously. so really I should have a mean for each month, for each station.
My data is organized by station.name vertically in column 1 and readings for months jan-dec in columns following, including a total column at the end (right). readings or observations are for each station for each month, over several years (station name listed in new row for each new year.)
I need to replace the NAs with the sums of the CLDD for the given month within the given station.name, how do I do this?
Try asking that question on https://stats.stackexchange.com/ (as suggested by the statistics tag), there are probably more R users there than on the general programming site. I also added the r tag to your question.
There is nothing wrong with splitting your data into station-month subsets, filling the missing values there, then reassembling them into one big matrix!
See also:
Replace mean or mode for missing values in R
Note that the common practice of filling missing values with means, medians or modes is popular, but may dilute your results since this will obviously reduce variance. Unless you have a strong physical argument why and how the missing values can be interpolated, it would be more elegant if you could find a way that can deal with missing values directly.

Resources