R: Extract top nth values with ID/name - r

I have a data frame with 21 variables and 1200 observations. The first column is the ID name for each species and column 21 is the total count of all the times each species was seen across multiple sites.
example columns: ID, RM1, RM2, RM10, Total
each row is an ID name and counts per river mile and total count
All I want is a list of the top 20 (or 100 for that matter) most abundant species and their total count. How do I do this?
This is driving me crazy and I don't want to do it in excel - there must be a way in R.

Sort you data frame, lets call it df, by Total, and take top 100
head(df[order(df$Total,decreasing = TRUE), ], 100)

Related

Collapse and Sum Data along multiple groupings in R

I have the following data table in R, which I need to collapse for streamlined data processing. I can do this manually, but I am looking for the most efficient way possible. The data frame looks like this:
and so on. Each age group has 4 observations, 2 male and 2 female (1 of each type). And region consists of city1, city2, city3, etc. which are all ordered the same as the example above. After all age groups are exhausted, the next cityX begins.
I need to combine gender into the total, summing males and females (within type). I also need to combine all age groups to give a population total (sum all age groups). I need to keep type separate, and then later combine them as an additional column. I want the final rows output to be the region. I need the population totals for each year column. So the final output would be like this:
I know this could be done manually by splitting the data frame repeatedly, but what would be the most efficient way to do this?

Group splitting with specific conditions

I have a dataframe with three colmuns; name of data point, group number assigned to that data point and species (data is animal related, and data points belong to one of two species).
Any given row looks like this
Name | Group Number | Species
Data Point A | 3 | 1
I would like to split groups only if that group contains above 90% of only one species, e.g if group 3 is 10 rows long and has 9 rows belonging only to either species 1 or species 2, then it satisfies my requirements and should be split.
I have looked into using the split function as well as the filter functions from dplyr but I can't seem to figure out how to get r to split groups with this percentage-based requirement. Any help would be useful, thank you!

Combining two vectors with rbind

I am trying to make a column called ID that contains 5000 rows to act as an identification column for observations on 20 individuals. I want there to be 200 observations for each of the first 10 individuals, and 300 observations for the next ten individuals (because I don't want the same number of observations for each individual). So I made two separate columns:
ID <- data.frame(ID=rep(c(12,122,242,329,595,130,145,245,654,878), each = 200))
ID2 <- data.frame(ID=rep(c(863,425,24,92,75,3,200,300,40,500), each = 300))
Why am I unable to stack one on top of the other (making a single column with all individuals) using rbind?
ID <- rbind(c(ID,ID2))
you were almost there, just don't use c() inside the rbind
ID <- rbind(ID,ID2)

Expand Row with Multiple Observations into Individual Rows

Just wondering if there is a way to expand rows which have multiple observations, into rows of unique observations using R? I have data in an excel spreadsheet with the variable headings: Lease, Line, Bay, Date, Predators, Food.Index, DD, MM, YY.
On some dates, there have been multiple predators (from 1 to 4) recorded in the same row. Other days just have 0. On a day where there has been 4 predators recorded, I would like to somehow transform the data to show four unique observations (instead of one row with 4 recorded under "Predators").
I have 1669 rows of data and multiple rows need to be expanded
Example of Data set
Many thanks for your help in advance.
enter image description here
Assuming you have your data in a data.frame, df, one possible solution would be
df.expanded <- df[rep(row.names(df), df$Predators), ]
EDIT: If you also want to keep the rows with 0 predators, you can use pmax to always return at least one:
df.expanded <- df[rep(row.names(df), pmax(df$Predators, 1)),]
Here the pmax(df$Predators, 1) will return the elementwise maximum of df$Predators and 1 so that it returns a new vector where each element is at least 1 but takes the value of df$Predators if that number is greater than 1.

Select subset of unique patient ID

I have a dataset of 19000. The lenght of the unique patient ID's is 15000.
I want to have a subset of these unique ID's, but with the other variables as in the orginal dataset
patnr age and 25 other variables
1 20
2 21
3 16
4 5
19000
How can i do this? Now i can only see how many unique patient ID's are in this database with this command:
length(unique(data$patnr))
Let's say your data.frame is called, df. You can use unique as follows to select the first instance of a patient ID appearing:
dfUnique <- df[unique(df$patn), ]
Note that this will drop roughly 4,000 rows and you would lose that information if the other variables are different for the same patient in the second observation.

Resources