Google Sheets - Count number of occurences of word BEFORE comma - count

I'm trying to count the number of times specific words, names in this case, occurs in a column. However, there may be any number of names in each cell in that column, and I'm only interested in the first one in each.
If there is more than one name in a cell, each of them is separated by a comma, and I'm hoping to use that in some way to ignore the names I don't want. It's very much like this question, the only difference is discarding everything after a comma.
Is there a way to do this in Sheets?

Does this formula work as you want (assuming your list of names is in the range A2:A):
=QUERY({ArrayFormula(IFERROR(LEFT(A2:A,SEARCH(",",A2:A)-1),A2:A)),A2:A},"select Col1, count(Col2) where Col1 <> '' group by Col1 label Col1 'Name'")

Related

Updating column values according to a specific combination of duplicates in R

I am still new to R and I am attempting to solve a seemingly simple problem. I would like to identify all of the unique combinations of values from 4 different rows, and update an additional column in my df to annotate whether or not it is unique.
Giving a df with columns A-Z, I have used the following code to identify unique combinations of column A,B,C,D, and E. I am trying to update column F with this information.
unique(df[ ,c("A", "B","C","D", "E")])
This returns each of the individual rows with unique combinations as expected, but I cannot figure out what the next step I should take in order to update column "F" with a value to indicate that it is a unique row. Thanks in advance for any pointers!

Filter in one column then filter occurences of a word

How would I count the number of occurences after I filter a dataset in regards to one column (e.g "Variant" column for "A/T") then subsequently filter in another column for words containing a particular word(e.g "SEQ" column for "G[A]C")?
Ive tried the following but received an error:
length(which(mydata$VARIANT =="A/T") & grep(length("G[A]A", mydata$SEQ)))
Checking in excel, filtering for just 'A/T' reveals 9 then there are 2 containing 'G[A]C'
When using &, we need logical vectors. So, instead of grep, it should be grepl
sum(mydata$VARIANT == "A/T" & grepl("G[A]A", mydata$SEQ, fixed = TRUE))

Change all cells marked X into cell value shown in column P

I need an easy way to convert all X's in a column into the value shown in a cell.
Basically we want to sell multiple products to a client with a target order value split amongst the relevant products - I have done a CountA formula to show how many columns are not blank. Then I did a simple divide to divide the total value over the columns that are not blank (if there are 2 columns marked X then it would be 10,000 / 2 - assuming the target value is 10k) Now I need to change all the X's into the figure shown in the cell as shown in the pic.
I cant for the life of me think of an easy way of doing it but sureley there is?
Screen shot of sheet
You need 2 sets of columns. Your first set has the X's and the blanks to mark which categories are applicable. The second set has the calculated values for the selected categories.
In the value columns, you can use a formula like the following for the first data row and first category:
=IF(E2 = "x", $D2, 0)
Assuming "D" is your "Dvided total" column, and "E" is your "Dairy & S..." column, and "2" is your first data row.

How to extract specific rows depending on part of the strings in one column in R

When I use R, I try to extract specific rows which have some specific strings in one column.
The data structure as following
ERC1 20679 14959 9770 RAB6-interacting protein 2 isoform
I want to extract the rows which have RAB6 in the last column. That column still has some other words besides RAB6 so I can not use column = "RAB6" to get them. It's just like a search function in excel. Does anyone have any ideas?
Assuming that your data frame is df:
df[grep("^RAB6", df$column),]
If not all values start with RAB6 remove the^.

extract columns that don't have a header or name in R

I need to extract the columns from a dataset without header names.
I have a ~10000 x 3 data set and I need to plot the first column against the second two.
I know how to do it when the columns have names ~ plot(data$V1, data$V2) but in this case they do not. How do I access each column individually when they do not have names?
Thanks
Why not give them sensible names?
names(data)=c("This","That","Other")
plot(data$This,data$That)
That's a better solution than using the column number, since names are meaningful and if your data changes to have a different number of columns your code may break in several places. Give your data the correct names and as long as you always refer to data$This then your code will work.
I usually select columns by their position in the matrix/data frame.
e.g.
dataset[,4] to select the 4th column.
The 1st number in brackets refers to rows, the second to columns. Here, I didn't use a "1st number" so all rows of column 4 are selected, i.e., the whole column.
This is easy to remember since it stems from matrix calculations. E.g., a 4x3 dimensional matrix has 4 rows and 3 columns. Thus when I want to select the 1st row of the third column, I could do something like matrix[1,3]

Resources