Count Values comma separated non-numeric (Google Sheets) - count

I'm trying to figure out how to get a count of answers inside a cell which are comma-separated in this format: Anna, peter, Hans, Otto (here it should be 4)
Need this for an assignment and nothing seems to work and my programming are very limited so I hope someone might help me out here :/
I have tried it in excel first with this formula:
=LEN(TRIM(A1))-LEN(SUBSTITUTE(TRIM(A1),",",""))+1
..which didn't work (the brackets around the first A1 and after substitute turned red - whats that telling us anyway? My search only show me entries about negative values..)
Then I tried this formula here in google spreadsheet:
=COUNTA(SPLIT(A1; ","))
..which also didn't work (here I simply get an error).
I guess it's about the values being non numeric? Any ideas?

It is possible that you need:
=LEN(TRIM(A1))-LEN(SUBSTITUTE(TRIM(A1);",";""))+1
If your Regional Settings require it. (see Scott's comment)

This should do the trick
=LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1
It just counts the commas and adds 1
Update
I just realized that's pretty much the same as what you had - just without using TRIM which isn't necessary. Your formula should work too.

In Excel, use:
=LEN(A1)-LEN(SUBSTITUTE(A1,",","")) + 1
Unless there is a chance for A having no value, then you need to expand it farther:
=IF(LEN(A1)>0,LEN(A1)-LEN(SUBSTITUTE(A1,",","")) + 1,0)
Since you also tagged Google Spreadsheets, there use:
=COUNTA( SPLIT(A1, ",", TRUE))
Same applies for the possiblity of an empty field in Google Sheets.

Related

R tables::tabular() use of Format() to get rid of scientifc and round

I have a quick question. I am using tables::tabular() and have some summary statistics which are displayed in scientific e notation.
I found out that using Format(scientific=FALSE) * mean() in this context helps me to get rid of the scientific e notation in the means, and other summary statistics i present.
Now, I also want to round this number with the format 0.123456789 (most of the means are means of ratios between 0 and 1 but not all) to show the digits left and maximum 4 to the right of the comma i.e. 0.1234. I tried simply putting the digits=4 option into the Format and while it seems to work when alone in the Format() function, it doesn't somehow when I also have scientific=FALSE in there. Rather, with Format(scientific=FALSE, digit=1), I still get 5 digits to the right of the comma.
Do you know what is happening and how i can fix it?
Highly appreciate your help,
cork
Have you already tried to use a global statement at the beginning like this: options("digits" = 4)?
All the best,
Patrick

COUNTIF of non-empty and non-blank cells

In Google Sheets I want to count the number of cells in a range (C4:U4) that are non-empty and non-blank. Counting non-empty is easy with COUNTIF. The tricky issue seems to be that I want to treat cells with one or more blank as empty. (My users keep leaving blanks in cells which are not visible and I waste a lot of time cleaning them up.)
=COUNTIF(C4:U4,"<>") treats a cell with one or more blanks as non-empty and counts it. I've also tried =COUNTA(C4:U4) but that suffers from the same problem of counting cells with one or more blanks.
I found a solution in stackoverflow flagged as a solution by 95 people but it doesn't work for cells with blanks.
After much reading I have come up with a fancy formula:
=COUNTIF(FILTER(C4:U4,TRIM(C4:U4)>="-"),"<>")
The idea is that the TRIM removes leading and trailing blanks before FILTER tests the cell to be greater than or equal to a hyphen (the lowest order of printable characters I could find). The FILTER function then returns an array to the COUNTIF function which only contains non-empty and non-blank cells. COUNTIF then tests against "<>"
This works (or at least "seems" to work) but I was wondering if I've missed something really obvious. Surely the problem of hidden blanks is very common and has been around since the dawn of excel and google sheets. there must be a simpler way.
(My first question so apologies for any breaches of forum rules.)
I don't know about Google. But for Excel you could use this array formula for multiple contiguous columns:
=ROWS(A1:B10) * COLUMNS(A1:B10)-(COUNT(IF(ISERROR(CODE(A1:B10)),1,""))+COUNT(IF(CODE(A1:B10)=32,1,"")))
Could try this but I'm not at all sure about it
=SUMPRODUCT(--(trim((substitute(A2:A5,char(160),"")))<>""))
seems in Google Sheets that you've got to put char(160) to match a space entered into a cell?
Seems this is due to a non-breaking space and could possibly apply to Excel also - as explained here - the suggestion is that you could also pass it through the CLEAN function to eliminate invisible characters with codes in range 0-31.
I found another way to do it using:
=ARRAYFORMULA(SUM(IF(TRIM($C4:$U4)<>"",1,0)))
I'm still looking for a simpler way to do it if one is available.
This should work:
=countif(C4:U4,">""")
I found this solution here:
Is COUNTA counting blank (empty) cells in new Google spreadsheets?
Please let me know if it does.
=COLUMNS(C4:U4)-COUNTBLANK(C4:U4)
This will count how many cells are in your range (C4 to U4 = 19 cells), and subtract those that are truly "empty".
Blank spaces will not get counted by COUNTBLANK, despite its name, which should really be COUNTEMPTY.

Removing duplicate in wordcloud in r

I am generating a word cloud of my tweets. But the problem is i am getting duplicate like shown below which are treated as separated character in my word cloud instead of one.
1) myname
2) "myname
3) myname"
My other problem is i am also getting some symbols in the word cloud
like ^ ~ etc. How to get rid of these symbols
#docendodiscimus answer solved my problem but I am getting now meaning words in my cloud like 'sadi24', 'yu1' etc even I though I removed Hashtags and # words ? how can i get rid of them?
this is the output where i can identify this is happening but may be there are many other words that may be suffering from this problem . please provide your thoughts on this.
Please note that I may have numerous similar kind of issue. Please provide solution to which i can easily generalize to all others
I am providing a screen shot of other data having the problem
Here I am getting words such as manager185878 and sadi24. You can see the output with some absurd symbol even after removing the Punctuation.

grep or gsub for everything except a specific string in R

I'm trying to match everything except a specific string in R, and I've seen a bunch of posts on this suggesting a negative lookaround, but I haven't gotten that to work.
I have a dataset looking at crime incidents in SF, and I want to sort cases that have a resolution or do not. In the resolution field, cases have things listed like arrest booked, arrest cited, juvenile booked, etc., or none. I want to relabel all the specific resolutions like the different arrests to "RESOLVED" and keep the instances with "NONE" as such. So, I thought I could gsub or grep for not "NONE".
Based on what I've read on finding all strings except one specific string, I would have thought this would work:
resolution_vector = grep("^(?!NONE$).*", trainData$Resolution, fixed=TRUE)
Where I make a vector that searches through my training dataset, specifically the resolution column, and finds the terms that aren't "NONE". But, I just get an empty vector.
Does anyone have suggestions, or know why this might not be working in R? Or, even if there was a way to just use gsub, how do I say "not NONE" for my regex in R?
trainData$Resolution = gsub("!NONE", RESOLVED, trainData$Resolution) << what's the way to negate the string here?
Based on your explanation, it seems as though you don't need regular expressions (i.e. gsub()) at all. You can use != since you are looking for all non-matches of an exact string. Perhaps you want
within(trainData, {
## next line only necessary if you have a factor column
Resolution <- as.character(Resolution)
Resolution[Resolution != "NONE"] <- "RESOLVED"
})
resolution_vector = grep("^(?!NONE$).*", trainData$Resolution, fixed=TRUE,perl=TRUE)
You need to use option perl=TRUE.

How can I count the number of comma separated numbers in a google spreadsheet?

I've got a cell which has values
1,2,3,4
I need a formula that returns 4 in another cell, however this Google spreadsheet looks really complicated. Also I need to trim because I might have white spaces between the numbers.
One option is to use the following formula:
=COUNT(SPLIT(A1; ","))
Here's an example:
Just so it is not missed when counting non-numeric items (thanks #noway and #wchiquito as count did not work for my text)
=COUNTA(SPLIT(A1,","))
I came a bit late to the party, but:
if you want to count 0 when there is nothing in the cell...
=COUNTA(SPLIT(A1, ",")) - NOT(LEN(A1))
Try below formula :
=LEN(TRIM(A1))-LEN(SUBSTITUTE(TRIM(A1),",",""))+1

Resources