count numbers in column not duplicates - xls

I have an XL file with about 5500 lines. One column is all numbers with many repeats like zip codes. How can I count how many numbers are in that column but eliminate duplicates. For example, maybe there are only 250 zip codes in the column. How can I count that?

Select your column then go to Data -> Additional: then select 'copy result to another column', select result position and check 'Only UNIQUE records', it will copy unique records to selected colunm.
Now you can just select all this column and look how much records there are.

The best thing to do is do it in steps.
First grab that whole column of values. Dedupe it. Paste it into it's own column.
Then down the line, put in this formula:
=COUNTIF(F18:F27,G18)
where f18:f27 is the range to check, and g18 is the number to check for. Copy that formula all the way down the list of unique values.

Related

str_extract// from a specific row that starts with: "More" =string name ,I have a set of numbers I need to extract and I am having trouble,

I am trying to extract from a data set like this one above the numbers from the row named waiting into a new column but if I try to do the result is correct in some rows but others are giving me the last numbers on other rows .
I would like to have the information from the specific row including decimals, into a new column but not to get the other numbers .
Please try this
library(stringr)
data <- c('t1 waiting 1234', 'waiting 1234.5')
data <- trimws(str_extract_all(data, '\\s\\d.*'))
data

How to skip empty rows while reading multiple tabs in R?

I am trying to read an excel file with multiple tabs. For that, I use the code provided here.
The problem is that each tab has a different number of empty rows before the actual data begins. For example, the first tab has two empty rows, the second tab has three empty rows, and so on.
Normally, I would use the parameter skip in the read_excel function to indicate the number of empty lines to skip. But how do I do that for multiple tabs with different numbers of rows to skip?
perhaps the easiest solution would be to read it as it is then remove rows, i.e. yourdata <- yourdata[!is.na(yourdata$columname),] ; this would work if you don't expect any NA's in a particular column, like id. If you have data gaps everywhere you can test for all NAs in multiple columns - let me know if that's what you need.

find common rows between two dataframes based on two columns using bash

I found this very difficult to solve in bash - I have two files that I want to find the common rows between them based on two columns.
f1.csv:
col1,col2,col3,col4
Dalir,Cpne1,down,2174
Fendrr,Aco2,up,280
Cpne1,Tox1,down,8900
f2.csv
col1,col2,col3,col4,col5,col6
Linc,Rmo,ch2,ch2,p,l
Tox1,Cpne1,ch1,ch2,l,p
so basically the code should look only at the first two columns of the dfs and see if pairs are the same (the order of the pairs is not important). So you can see that in the first df there is
Cpne1,Tox1 in the third row and in the second df there is Tox1,Cpne1 in the second row - so this should be printed in the output from the second file.
Desired output:
Tox1,Cpne1
Unfortunately, I have not been able to develop a bash command for this - it would be great if you could help me with this. Thanks
Just adding the explanation to oguz' fine answer in the comments above:
BEGIN{FS=OFS=","} defines , to be the separator for both input and output.
NR==FNR{pair[$1,$2];next} while the record number of the entire input matches the current file's record number (in other words, for the first file) add an element with the first and second field as index to the array pair.
($1,$2) in pair||($2,$1) in pair{print $1,$2} operating on the second file, check if field one and two in any order are present as index in the array pair, and print them if they are.

R - conditionally modify a dataframe

I want to check to see if a dataframe has a certain number of columns, and then conditionally modify the dataframe based on the number of columns it has.
If I try
ifelse(ncol(Table1)=="desired number of columns",Table1[,ColumnsSelected],Table1[,ColumnsSelected2]))
I get Table 1 back, but only one column, and it looks weird. Is there a way that I can change this to make it so that I can return a dataframe with the desired columns based on the number of columns in the dataframe?
I have roughly 700 tables that have 2 types of formatting that I would prefer not to have to individually scan to reformat.
Please advise

Expression to Exclude rows that have a specific column blank

I want to exclude entire rows that have blanks for a specific column....I dont want to show a row that has the first name column as a blank...I was thinking maybe a case statement would help with this
case when [First Name] = blank then exclude???
If this is SQL, in general you could do this:
SELECT *
FROM myTable
WHERE myCol IS NOT NULL
I had to add a calculated column that would be used as a flag, and then filter on the flag.
if(Len([First Name])>0,1,0)
Now I am able to filter on the calculated column to show only rows that have 1 in it (which will be the rows where First Name is not empty).
This doesn't delete the row from the dataset, but has the same effect in visualizations as if it had been.

Resources