This question already has answers here:
How to filter a data frame
(2 answers)
Select rows from a data frame based on values in a vector
(3 answers)
Closed 5 years ago.
I have a data.frame listing locations that have been sampled several years, and calculated the number of species at those locations. Hence, I have per location a species number for each year.
However, not every location has been sampled each year.
It would look something like this:
Location Year Species
1 2007 3
1 2008 10
2 2008 4
2 2009 5
2 2010 6
3 2007 3
3 2008 10
3 2009 5
3 2010 6
I want to select only those stations that have been sampled each year, and get a data.frame showing only these locations, the relevant years and their species numbers.
In the above example that would obviously be only location 3.
I searched various sites thoroughly but could not find the answer. I guess the answer is quite simple using either aggregate or subset, and I tried various solutions, but to no avail.
Edit: the answers referred to as being duplicate do not answer my question: I want to select the various years, but only return the stations that contain all these years. Answers referred to only supply the lines with the various years, but not conditional on the stations.
Edit
The answer to my question appeared to be adding a column to my data.frame counting the frequency of the unique values in the column Location, using the following code:
transform(df, freq.loc = ave(seq(nrow(df)), location, FUN=length))
Related
This question already has answers here:
How to get the sum of each four rows of a matrix in R
(3 answers)
Average of n rows
(1 answer)
Closed 8 months ago.
I have a column with hundreds of values, I want to get the average for each 4 rows, then I move to another four rows, etc. How could I write that loop in R studio, I attached here a simple example, what should I do if I have NA values, how to consider those values as zeros?
This question already has answers here:
Omit rows containing specific column of NA
(10 answers)
Closed 1 year ago.
I have a data set of car crash data where I am gonna analyse them based of their locations. However, I want to clean the data first, How would I go about in removing crashes that have NA in the region column.
library(dplyr)
your_data_frame %>%
filter(!is.na(region_column))
This question already has answers here:
Unique combination of all elements from two (or more) vectors
(6 answers)
Closed 2 years ago.
I have been looking around but I have been unable to find a way to do this in R.
I have multiple vectors and I want to combine them to create a fixed data structure.
For example I have the following vectors:
year <- c("2019","2020")
month <- c("1","2"....)
country <- c("USA","GER","CAN"...)
I want to be able to create a dataframe that is structured as follows:
year month country
2019 1 USA
2019 1 GER
2019 1 CAN
2019 2 USA
....
I have looked around, but I have not found a way to create the desired result.
Any help is greatly appreciated,
Daniel
An option is crossing from tidyr
library(tidyr)
crossing(year, month, country)
Or with expand.grid from base R
expand.grid(year, month, country)
This question already has answers here:
Remove columns from dataframe where some of values are NA
(8 answers)
Closed 3 years ago.
I have a 17(r) by 20 (c) matrix where all data is numbers and NA. I am trying to remove all columns that has the value NA in any rows. This is 11 of the 20 columns. I've been searching for an hour and tried several methods but couldn't get it right.
my.data [ ,!is.na(my.data[ ,1:20])]
To me this makes the most sense but is giving 'script too long' error.
One basic approach would be
mydata[, !is.na(colSums(mydata))]
This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 4 years ago.
I would like to expand a sample survey and simulate a population. For example, if I have the following data sample(very small for explain my question) like
control weight sex age race
1 2 F 23 W
2 3.1 M 21 B
3 5.3 F 19 W
In this case, control represents the interviewed people. For example, I would like get a dataframe where the control 1 (some person, sex female , 23 yeard old and white) repeats 2 times(2 rows). The dificult arises when I try to repeats 3.1 times the control number 2 and 5.3 the contol number 3, preserving the sex, age and race.
There is the "survey" package, but I don't know if there is some function for this situation.
How can I find a solution for this problem?
If you need the expand the rows of the dataset, based on the value in the 'weight' column, one option would be expandRows from splitstackshape. This will be similar to df1[rep(1:nrow(df1), weight),].
library(splitstackshape)
expandRows(df1, 'weight')