COUNTIF of non-empty and non-blank cells - count

In Google Sheets I want to count the number of cells in a range (C4:U4) that are non-empty and non-blank. Counting non-empty is easy with COUNTIF. The tricky issue seems to be that I want to treat cells with one or more blank as empty. (My users keep leaving blanks in cells which are not visible and I waste a lot of time cleaning them up.)
=COUNTIF(C4:U4,"<>") treats a cell with one or more blanks as non-empty and counts it. I've also tried =COUNTA(C4:U4) but that suffers from the same problem of counting cells with one or more blanks.
I found a solution in stackoverflow flagged as a solution by 95 people but it doesn't work for cells with blanks.
After much reading I have come up with a fancy formula:
=COUNTIF(FILTER(C4:U4,TRIM(C4:U4)>="-"),"<>")
The idea is that the TRIM removes leading and trailing blanks before FILTER tests the cell to be greater than or equal to a hyphen (the lowest order of printable characters I could find). The FILTER function then returns an array to the COUNTIF function which only contains non-empty and non-blank cells. COUNTIF then tests against "<>"
This works (or at least "seems" to work) but I was wondering if I've missed something really obvious. Surely the problem of hidden blanks is very common and has been around since the dawn of excel and google sheets. there must be a simpler way.
(My first question so apologies for any breaches of forum rules.)

I don't know about Google. But for Excel you could use this array formula for multiple contiguous columns:
=ROWS(A1:B10) * COLUMNS(A1:B10)-(COUNT(IF(ISERROR(CODE(A1:B10)),1,""))+COUNT(IF(CODE(A1:B10)=32,1,"")))

Could try this but I'm not at all sure about it
=SUMPRODUCT(--(trim((substitute(A2:A5,char(160),"")))<>""))
seems in Google Sheets that you've got to put char(160) to match a space entered into a cell?
Seems this is due to a non-breaking space and could possibly apply to Excel also - as explained here - the suggestion is that you could also pass it through the CLEAN function to eliminate invisible characters with codes in range 0-31.

I found another way to do it using:
=ARRAYFORMULA(SUM(IF(TRIM($C4:$U4)<>"",1,0)))
I'm still looking for a simpler way to do it if one is available.

This should work:
=countif(C4:U4,">""")
I found this solution here:
Is COUNTA counting blank (empty) cells in new Google spreadsheets?
Please let me know if it does.

=COLUMNS(C4:U4)-COUNTBLANK(C4:U4)
This will count how many cells are in your range (C4 to U4 = 19 cells), and subtract those that are truly "empty".
Blank spaces will not get counted by COUNTBLANK, despite its name, which should really be COUNTEMPTY.

Related

Is there a way to extract a substring from a cell in OpenOffice Calc?

I have tens of thousands of rows of unstructured data in csv format. I need to extract certain product attributes from a long string of text. Given a set of acceptable attributes, if there is a match, I need it to fill in the cell with the match.
Example data:
"[ROOT];Earrings;Brands;Brands>JeweleryExchange;Earrings>Gender;Earrings>Gemstone;Earrings>Metal;Earrings>Occasion;Earrings>Style;Earrings>Gender>Women's;Earrings>Gemstone>Zircon;Earrings>Metal>White Gold;Earrings>Occasion>Just to say: I Love You;Earrings>Style>Drop/Dangle;Earrings>Style>Fashion;Not Visible;Gifts;Gifts>Price>$500 - $1000;Gifts>Shop>Earrings;Gifts>Occasion;Gifts>Occasion>Christmas;Gifts>Occasion>Just to say: I Love You;Gifts>For>Her"
Look up table of values:
Zircon, Diamond, Pearl, Ruby
Output:
Zircon
I tried using the VLOOKUP() function, but it needs to match an entire cell and works better for translating acronyms. Haven't really found a built in function that accomplishes what I need. The data is totally unstructured, and changes from row to row with no consistency even within variations of the same product. Does anyone have an idea how to do this?? Or how to write an OpenOffice Calc function to accomplish this? Also open to other better methods of doing this if anyone has any experience or ideas in how to approach this...
ok so I figured out how to do this on my own... I created many different columns, each with a keyword I was looking to extract as a header.
Spreadsheet solution for structured data extraction
Then I used this formula to extract the keywords into the correct row beneath the column header. =IF(ISERROR(SEARCH(CF$1,$D769)),"",CF$1) The Search function returns a number value for the position of a search string otherwise it produces an error. I use the iserror function to determine if there is an error condition, and the if statement in such a way that if there is an error, it leaves the cell blank, else it takes the value of the header. Had over 100 columns of specific information to extract, into one final column where I join all the previous cells in the row together for the final list. Worked like a charm. Recommend this approach to anyone who has to do a similar task.

Benefit of interpreting blank space elements as valid factor elements in the R function factor()?

The base R function factor() interprets character elements consisting of blank space as valid factor elements instead of NA. What is the benefit of interpreting blank space character elements like this? Is it a legacy feature that is kept as it is to maintain compatibility?
Example:
factor(c("a","a","","b"))
I realize that this isn't an ordinary problem that can be solved with a reproducible example as a starting point, but I decided to give it a try anyway. The design decision to have factor() interpret blank space character elements like this confounds me. It seems to me that it would simplify things with no clear disadvantages to interpret these elements as NA instead.
What is the benefit of interpreting blank space character elements like this?
Because empty string data usually means “this is an empty string”, and not “this is missing data”.
It depends on the usage of course: an empty “name” field is most likely missing data. But an empty “title” field is just that: no title. How else would you encode lack of a title (assuming “Mr” and “Mrs” have a separate field, which may not be the case).
For factors, having empty labels makes less sense. However, R tends to convert strings to factors quite liberally (especially when reading tabular data from files), and treating all those empty values as NA would cause a lot of mis-annotated data. In general, such implicit conversions should always be lossless, i.e. preserve the whole domain of values being converted.

Count Values comma separated non-numeric (Google Sheets)

I'm trying to figure out how to get a count of answers inside a cell which are comma-separated in this format: Anna, peter, Hans, Otto (here it should be 4)
Need this for an assignment and nothing seems to work and my programming are very limited so I hope someone might help me out here :/
I have tried it in excel first with this formula:
=LEN(TRIM(A1))-LEN(SUBSTITUTE(TRIM(A1),",",""))+1
..which didn't work (the brackets around the first A1 and after substitute turned red - whats that telling us anyway? My search only show me entries about negative values..)
Then I tried this formula here in google spreadsheet:
=COUNTA(SPLIT(A1; ","))
..which also didn't work (here I simply get an error).
I guess it's about the values being non numeric? Any ideas?
It is possible that you need:
=LEN(TRIM(A1))-LEN(SUBSTITUTE(TRIM(A1);",";""))+1
If your Regional Settings require it. (see Scott's comment)
This should do the trick
=LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1
It just counts the commas and adds 1
Update
I just realized that's pretty much the same as what you had - just without using TRIM which isn't necessary. Your formula should work too.
In Excel, use:
=LEN(A1)-LEN(SUBSTITUTE(A1,",","")) + 1
Unless there is a chance for A having no value, then you need to expand it farther:
=IF(LEN(A1)>0,LEN(A1)-LEN(SUBSTITUTE(A1,",","")) + 1,0)
Since you also tagged Google Spreadsheets, there use:
=COUNTA( SPLIT(A1, ",", TRUE))
Same applies for the possiblity of an empty field in Google Sheets.

Replacing a symbol in a .txt file

Alright, I've been given a program that requires me to take a .txt file of varying symbols in rows and columns that would look like this.
..........00
...0....0000
...000000000
0000.....000
............
..#########.
..#...#####.
......#####.
...00000....
and using command arguments to specify row and column, requires me to select a symbol and replace that symbol with an asterisk. The problem i have with this is that it then requires me to recur up, down, left, and right any of the same symbol and change those into an asterisk.
As i understand it, if i were to enter "1 2" into my argument list it would change the above text into.
**********00
***0....0000
***000000000
0000.....000
............
..#########.
..#...#####.
......#####.
...00000....
While selecting the specified character itself isn't a problem, how do i have any similar, adjacent symbols change and then the ones next to those. I have looked around but can't find any information and as my teacher has had a different subs for the last 3 weeks, i havent had a chance to clarify my questions with them. I've been told that recursion can be used, but my actual experience using recursion is limited. Any suggestions or links i can follow to get a better idea on what to do? Would it make sense to add a recursive method that takes the coordinates given adds and subtracts from the row and column respectively to check if the symbol is the same and repeats?
Load in char by char, row by row, into a 2D array of characters. That'll make it a lot easier to move up and down and left and right, all you need to do is move one of the array indexes.
You can also take advantage of recursion. Make a function that changes all adjacent matching characters, and then call that same function on all adjacent matching characters.

Cleansing an excel spreadsheet with whitespace cells

I'm looking for advice about how to cleanse an excel spreadsheet using R.
http://www.abs.gov.au/AUSSTATS/abs#.nsf/DetailsPage/5506.02012-13?OpenDocument
Gathering the years by tidyr::gather is simple enough. The difficulty is the subgroups. The groups are defined by whitespace. Each amount of whitespace is a subgroup.
My question is how to assign each row to its group, so that the table is tidy form.
My initial instinct was to look where there is a line of NAs in the spreadsheet and use na.locf to fill them, but that method cannot distinguish between subgroups followed by groups without subgroups. Is there a way to count the amount of whitespace visible before the cells in the linked excel spreadsheet?
On the particular sheet you are talking about, there aren't any leading characters - the indentation is just the formatting applied to the cell, in much the same way as you might apply a font to a cell.
The only way to count the indents in the formatting is to create a macro . Here's a user defined function that will work:
Public Function inds(r As Excel.Range) As Integer
inds = r.Cells(1, 1).IndentLevel
End Function
You would then just count the indents with =inds(a3)
Looks like you might be trying to prepare the data for a pivot table (there might be better options). However to count the leading spaces, simple formula:
=len(a3)-len(trim(a3))+1

Resources