Count the number of blank cells below a non-blank cell - count

I am working in Microsoft Excel 2013. In the screenshot below, I have a list of people's names (under SURNAME column) and the household they belong to (New No.). I would like Excel to count the number of people in each household, and enter it into the HH Siz column. One person, Ntuyu, lives in WC335. Two people, N. Dlakiya and Y. Dlakiya, live in WC415. I need a formula to do this automatically. Please help. Thanks.
Screenshot of the excel sheet

Put this in C2 and copy down the list:
=IF(A2<>"",COUNTA($B2:B$1048576)-SUM(C3:C$1048576),"")

Related

How to scrape a complex table which has columns spanning multiple rows (a pedigree chart) in R?

I've looked at all the other related Stack Overflow questions, and none of them are close enough to what I'm trying to do to be useful. In part because, while some of those questions address dealing with tables where the leftward columns span multiple rows (as in a pedigree chart), they don't address how to handle the messy HTML which is somehow generating the chart. When I try the usual ways of ingesting the table with rvest it really doesn't work.
The table I'm trying to scrape looks like this:
When I extract the HTML of the first row (tr) of the table, I see that it contains: Betty, Jack, Bo, Bob, Jim, Dan, b 1932 (the very top of the table).
Great, you say, but not so fast. Because with this structure there's no way to know that Betty's mom is Sue (because Sue is on a different row).
Sue's row doesn't include Betty, but instead starts with Sue herself.
So in this example, Sue's row would be: Sue, Owen, Jacob, Luca, Blane, b 1940.
Furthermore, the row #2 in the HTML is actually just Ava b 1947.
I.e., the here's the content of each HTML row:
I tried using rvest to download the page and then extract the table.
A la:
pedigree <- read_html(page) %>% html_nodes("#PedigreeTable") %>% html_table
It really didn't work. Oddly, I got every column duplicated twice--so not too bad, but I'd rather it be a tibble/dataframe/matrix with the first column being 32 Bettys, and then the next column be 16 of each of Jack and Sue, etc...
I hope this is all clear as mud!
Ideally, as far as output, I'd get a nice neat dataframe with the columns person, father, mother. Like so:
Thanks in advance!
Maybe writting a algorithm can do it, like :
Select only the last two columns :
father_name=first value of the penultimate column
then browse the column to find the next non-NA value, count each rows
count=1 + number of NA values
mother_name=second non NA value
then count all rows until you find a name
count=count + 1 + number of NA values
Create your final table with :
name | father | mother
Isolate all the family's child names, and save them in your final table.
Assign father_name and mother_name in corresponding columns in your final table
Delete all rows used and start again.
Once you have assigned all the last column-people to their parents, delete the last column.
Then, delete all blank rows to have a structure similar to the one needed in the fist step, and start the algorithm again
Hope that helps !!
PS : I suggest that you give an unique ID to each person at some point, to avoid confusion between people that have the same name.

Paste name of column to other columns in R?

I have recently received an output from the online survey (ESRI Survey123), storing the each recored attribte as a new column of teh table. The survey reports characteristics of single trees located on study site: e.g. beech1, beech2, etc. For each beech, several attributes are recorded such as height, shape, etc.
This is how the output table looks like in Excel. ID simply represent the site number:
Now I wonder, how can I read those data into R to make sure that columns 1:3 belong to beech1, columns 4:6 represent beech2, etc.? I am looking for something that would paste the beech1 into names of the following columns: beech1.height, beech1.shape. But I am not sure how to do it?

Is there a way to delete all rows that don't have numbers in string

So i've extracted a dataset of customers from our system and i've loaded in this dataset. I need the dataset to only have street names and the postal code with city in order to send out customer letters The names and NA rows need to be removed. i need to remove all empty lines and all names. I only need the address and zip code.
Therefor i need to delete all rows where there isn't a number in it.
I've found the answer.
df[str_detect(df$..., "\\d")]

lookup count on different tab using 2 variables

I have data on one excel sheet where there are about 10 columns But I need to count data using just two of them. For Example:
Column C has district names and Column D says what their title is.
The file has hundreds of entries and I need to know how many Admins are in district 1. Only that the results would be on a different tab. I have tried using Vlookup but my knowledge only takes me as far as looking up just one criteria.
If your "one excel sheet" is called Sheet1, district 1 is in ColumnC of that sheet and Admins in ColumnD of that sheet then on a different tab:
=COUNTIFS(Sheet1!C:C,"district 1",Sheet1!D:D,"Admins")

count numbers in column not duplicates

I have an XL file with about 5500 lines. One column is all numbers with many repeats like zip codes. How can I count how many numbers are in that column but eliminate duplicates. For example, maybe there are only 250 zip codes in the column. How can I count that?
Select your column then go to Data -> Additional: then select 'copy result to another column', select result position and check 'Only UNIQUE records', it will copy unique records to selected colunm.
Now you can just select all this column and look how much records there are.
The best thing to do is do it in steps.
First grab that whole column of values. Dedupe it. Paste it into it's own column.
Then down the line, put in this formula:
=COUNTIF(F18:F27,G18)
where f18:f27 is the range to check, and g18 is the number to check for. Copy that formula all the way down the list of unique values.

Resources