Remove rows where value is *not* in column on other sheet - formula

I have a LibreOffice Calc file with two sheets. Sheet 2 has just one column A with lots of numbers. In sheet 1, column A of every row also holds a number. I want to remove all rows from sheet 1 that have a value in column A which does not appear anywhere in column A of sheet 2.
Filters don't seem to do the trick, as they don't have a "value must be contained in some column" operator.
Any ideas?

Enter the following formula in cell B1 of Sheet1:
=IF(ISNA(VLOOKUP(A1,Sheet2.A:A,1, 0)),"",A1)
Then drag to fill this formula down column B. This shows empty cells for all rows that do not occur in Sheet2.
To remove the empty rows, sort on column B (Data -> Sort). Then select and delete the empty rows (Edit -> Delete Rows).

Related

How to split rows within a dataframe for a target column with multiple/nested values

With a dataframe that has, for example, one column x that has nested or multiple values for some rows, how would i, for those rows that have multiple values for x, append duplicate rows to the dataframe, save that that they correspond to one value within x.
To try to explain better, see "mock dataframe pre-transform", below. Row 1 has values "webui, cli, mobile" for column "module", and what i want is to append three near copies of row 1 to the dataframe, one with module value "webui", one with module value "cli" and one with module value "mobile". I also then want to remove the the original row 1. A similar operation would occur for row 4, such that the final dataframe would have 7 rows (see "mock dataframe post-transform, below).
mock dataframe pre-transform
mock dataframe post-transform

R - Extract out multiple specific rows into another variable

Press here for Dataset relating to question
How do you extract only the Male rows into another dataset whilst keeping the rownames & column names intact?
I think
mydf[mydf["sex"] == "Male"],]
has the effect you want (where mydf is the name of your dataframe). Note that mydf["sex"] is the column of values labeled "sex" and mydf["sex"] == "Male" is a column of Boolean values (TRUE where the value is "Male").
The comma with nothing between it and the right square bracket mydf[...,] is important. It means, select all columns.

How to sort the first 20 rows in first column in alphabetical order in a data frame

I'm new to R coding and i'm doing exercises and I got stuck. In my data frame, the first row are patients e.g patient 1, patient 2 etc and the first column are gene names eg gene abc123,gene def456. What I want to know is how to sort the first 20 rows in column 1 in alphabetical order. Thanks
EDIT
I have put up a screenshot of the file in excel and i am trying to extract the ones in the red box in alphabetical order. I am unsure what to call column 1 in the console as it doesn't have a heading. In the file provided, each row represents expression values for a single gene, and each column
represents expression values for a single sample (patient).
The first column of each row is the gene identifier: (gene-symbol|entrez ID)
e.g. "A2M|2" (A2M is the gene-symbol and 2 is the entrez database identifier for alpha 2 macroglobulin)
Each sample identifier is formatted as: TCGA-ID_Tissue
where the Tissue is either "TissueA" or "TissueB" e.g. "TCGA-AA-3548_TissueA"
The question is "Sort the gene names alpahabetically (A-Z) and print out the first 20 gene names"
screenshot of the table

Change all cells marked X into cell value shown in column P

I need an easy way to convert all X's in a column into the value shown in a cell.
Basically we want to sell multiple products to a client with a target order value split amongst the relevant products - I have done a CountA formula to show how many columns are not blank. Then I did a simple divide to divide the total value over the columns that are not blank (if there are 2 columns marked X then it would be 10,000 / 2 - assuming the target value is 10k) Now I need to change all the X's into the figure shown in the cell as shown in the pic.
I cant for the life of me think of an easy way of doing it but sureley there is?
Screen shot of sheet
You need 2 sets of columns. Your first set has the X's and the blanks to mark which categories are applicable. The second set has the calculated values for the selected categories.
In the value columns, you can use a formula like the following for the first data row and first category:
=IF(E2 = "x", $D2, 0)
Assuming "D" is your "Dvided total" column, and "E" is your "Dairy & S..." column, and "2" is your first data row.

extract columns that don't have a header or name in R

I need to extract the columns from a dataset without header names.
I have a ~10000 x 3 data set and I need to plot the first column against the second two.
I know how to do it when the columns have names ~ plot(data$V1, data$V2) but in this case they do not. How do I access each column individually when they do not have names?
Thanks
Why not give them sensible names?
names(data)=c("This","That","Other")
plot(data$This,data$That)
That's a better solution than using the column number, since names are meaningful and if your data changes to have a different number of columns your code may break in several places. Give your data the correct names and as long as you always refer to data$This then your code will work.
I usually select columns by their position in the matrix/data frame.
e.g.
dataset[,4] to select the 4th column.
The 1st number in brackets refers to rows, the second to columns. Here, I didn't use a "1st number" so all rows of column 4 are selected, i.e., the whole column.
This is easy to remember since it stems from matrix calculations. E.g., a 4x3 dimensional matrix has 4 rows and 3 columns. Thus when I want to select the 1st row of the third column, I could do something like matrix[1,3]

Resources