Conditional removal of rows in R - r

I have a data frame with 2 columns and 26 rows, the first column is composed of characters while the second column is composed of numbers.
I also have a vector with a random selection of 5 characters.
I want to sum the numbers from column two of the 5 random characters.
How can I calculate this sum?

We can use aggregate
aggregate(ints ~ char, data1, sum)

Maybe what you need is :
result <- sum(data1$ints[data1$char %in% sample1], na.rm = TRUE)
This will sum the ints value in data1 which is present in sample1.

Related

how to divide the value in each cell of a .csv by the value in another cell across multiple rows and variables in R?

I have a .csv file of 39 variables and 713 rows, each containing a count of plastic items. I have another column which is the survey length, and I want to standardise each count of items by a survey length of 100. I am unsure how to create a loop to run through each row and cell individually to do this. Many also have NA values.
Any ideas would be great.
Thank you.
Consider applying formula directly on columns without need of looping:
# RETRIEVE ALL COLUMN NAMES (MINUS SURVEY LENGTH)
vars <- names(df)[!grepl("survey_length", names(df))]
# EXPAND SINGLE COLUMN TO EQUAL DIMENSION OF DATA FRAME
survey_length_mat <- matrix(df$survey_length, ncol=length(vars), nrow=nrow(df))
# APPLY FORMULA
df[vars] <- (df[vars] / survey_length_mat) * 100
df

Summing values in two different columns in R

I have a dataset in which I wish to sum each value in column n, with its corresponding value in column (n+(ncol/2)); i.e., so I can sum a value in column 1 row 1 with a value in column 12 row 1, for a dataset with 22 columns, and repeat this until column 11 is summed with column 22. The solution needs to work for hundreds of rows.
How do I do this using R, while ignoring the column names?
Suppose your data is
d <- setNames(as.data.frame(matrix(rnorm(100 * 22), nc = 22)), LETTERS[1:22])
You can do a simple matrix addition using numbers to select the columns:
output <- d[, 1:11] + d[, 12:22]
so, e.g.
all.equal(output[,1], d[,1] + d[,12])
# [1] TRUE

How to divide total number of rows through matching rows in R

Let's say I have a df with 100 rows. 25 of these rows match a specific criteria. I want to divide the total number of my df through my matching rows and add the value to a vector.
e.g. 100/25 = 25 ===> c(25)
x<-1:100
Criteria:
<=25
avector<-length(x)/sum(x<=25)
If you need to append to an existing vector use append.
I think I found a solution
df <- c(nrow(subset(data, var1 > 999))/(nrow(data)))

How to subset a data frame by taking only the Non NA values of 2 columns in this data frame

I am trying to subset a data frame by taking the integer values of 2 columns om my data frame
Subs1<-subset(DATA,DATA[,2][!is.na(DATA[,2])] & DATA[,3][!is.na(DATA[,3])])
but it gives me an error : longer object length is not a multiple of shorter object length.
How can I construct a subset which is composed of NON NA values of column 2 AND column 3?
Thanks a lot?
Try this:
Subs1<-subset(DATA, (!is.na(DATA[,2])) & (!is.na(DATA[,3])))
The second parameter of subset is a logical vector with same length of nrow(DATA), indicating whether to keep the corresponding row.
The na.omit functions can be an answer to you question
Subs1 <- na.omit(DATA[2:3])
[https://stat.ethz.ch/R-manual/R-patched/library/stats/html/na.fail.html]
Here an example.
a,b ,c are 3 vectors which a and b have a missing value.
once they are created i use cbind in order to bind them in one matrix which afterwards you can transform to data frame.
The final result is a dataframe where 2 out of 3 columns have a missing value.
So we need to keep only the rows with complete cases.DATA[complete.cases(DATA), ] is used in order to keep only these rows that have not missing values in every column. subset object is these rows that have complete cases.
a <- c(1,NA,2)
b <- c(NA,1,2)
c <- c(1,2,3)
DATA <- as.data.frame(cbind(a,b,c))
subset <- DATA[complete.cases(DATA), ]

Subsetting a dataframe

I have a dataframe with 23000 rows and 8 columns
I want to subset it using only unique identifiers that are in column 1. I do this by,
total_res2 <- unique(total_res['Entrez.ID']);
This produces 17,000 rows with only the information from column 1.
I am wondering how to extract the unique rows, based on this column and also take the information from the other 7 columns using only these unique rows.
This returns the rows of total_res containing the first occurrences of each Entrez.ID value:
subset(total_res, ! duplicated( Entrez.ID ) )
or did you mean you only want rows whose Entrez.ID is not duplicated:
subset(total_res, ave(seq_along(Entrez.ID), Entrez.ID, FUN = length) == 1 )
Next time please provide test data and expected output.

Resources