How do I retain just one observation in my dataset when the dataset contains two columns with duplicate values? For example if this is my dataset below:
row1 & row 2
col(Sepal.Length) and col(Petal.Length)
contain similar values (5.1, 1.4), (5.1, 1.4)
I want to remove the second row and just retain the first row.
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 5.1 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 5.0 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
Reproducible test data:
test12 <- head(iris)
test12[2,1] <- 5.1
Thanks in advance.
Use duplicated to compare those specific columns:
test12[!duplicated(test12[,c(1,3)]),]
## or referencing the column names themselves:
test12[!duplicated(test12[,c("Sepal.Length","Petal.Length")]),]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 setosa
#3 4.7 3.2 1.3 0.2 setosa
#4 4.6 3.1 1.5 0.2 setosa
#5 5.0 3.6 5.0 0.2 setosa
#6 5.4 3.9 1.7 0.4 setosa
To keep only the first row:
row1 <- test12[1, ]
To drop the second row of your dataFrame:
dropRow <- test12[-2, ]
Related
Consider this famous table (already exists in R)
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
Please notice that it has a column named Sepal.Length.
I defined a variable with the same name. Please consider this code:
table = iris
Sepal.Length = 0
table2 = table %>% mutate ( new = Sepal.Length*Petal.Length )
If you check the result:
head(table2)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species new
1 5.1 3.5 1.4 0.2 setosa 0.28
2 4.9 3.0 1.4 0.2 setosa 0.28
3 4.7 3.2 1.3 0.2 setosa 0.26
4 4.6 3.1 1.5 0.2 setosa 0.30
5 5.0 3.6 1.4 0.2 setosa 0.28
6 5.4 3.9 1.7 0.4 setosa 0.68
As you see, the variable Sepal.Length = 0 has been ignored and the column table$Sepal.Length has been taken into account for creating the new column.
How can I have use variables in the calculations of mutate function?
If we want to use the object from Global env which is also a column name in the data, use .env
library(dplyr)
table2 <- table %>%
mutate ( new = Petal.Width* .env$Sepal.Length )
Alternatively, put !! in front of Sepal.Length as noted here https://stackoverflow.com/a/47659705/13015865
packages
library(dplyr)
Solution
table <- iris #no need to change the name of the dataset. But ok.
Sepal.Length <- 0
table %>% mutate ( new = !!Sepal.Length*Petal.Length )
output (head)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species new
1 5.1 3.5 1.4 0.2 setosa 0
2 4.9 3.0 1.4 0.2 setosa 0
3 4.7 3.2 1.3 0.2 setosa 0
4 4.6 3.1 1.5 0.2 setosa 0
5 5.0 3.6 1.4 0.2 setosa 0
6 5.4 3.9 1.7 0.4 setosa 0
I have a dataframe and I want to Create a subset,< Frame>, of just the species variable and display the first five records. with R how can I subset?
there are 10 rows and 7 columns.one column is Species
netID- fishID - species- tl - wtag - scale
By select.
head(
select(dataframe, speceis)
)
Assuming your dataframe is called df you can subset with dplyr
library(dplyr)
df <- iris[1:10,]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
newdf<-df %>% select(Species) %>%slice(1:5)
Here you are selecting species from your data frame and then using slice you can select the range of rows you need. The Output of newdf is
Species
1 setosa
2 setosa
3 setosa
4 setosa
5 setosa
This question already has an answer here:
Create new column with binary data or presence/absence data in R [duplicate]
(1 answer)
Closed 2 years ago.
I have a list of Noxious Weed species in California and a table with all of the species ever seen in a certain site. I want to create a column in the table that will denote which species are Noxious Weeds.
I've been hitting dead ends with this all day and I'm not sure how to continue!
data(iris)
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
iris$new_col <- ifelse(iris$Species=="setosa",1,0)
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_col
1 5.1 3.5 1.4 0.2 setosa 1
2 4.9 3.0 1.4 0.2 setosa 1
3 4.7 3.2 1.3 0.2 setosa 1
4 4.6 3.1 1.5 0.2 setosa 1
5 5.0 3.6 1.4 0.2 setosa 1
6 5.4 3.9 1.7 0.4 setosa 1
I would like to add a prefix to my dataset column names only if they already begin with a certain string, and I would like to do it (if possible) using a dplyr pipeline.
Taking the iris dataset as toy example, I was able to get the expected result with base R (with a quite cumbersome line of code):
data("iris")
colnames(iris)[startsWith(colnames(iris), "Sepal")] <- paste0("YAY_", colnames(iris)[startsWith(colnames(iris), "Sepal")])
head(iris)
YAY_Sepal.Length YAY_Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
In this example, the prefix YAY_ has been added to all the column names starting with Sepal. Is there a way to obtain the same result with a dplyr command/pipeline?
An option would be rename_at
library(tidyverse)
iris %>%
rename_at(vars(starts_with("Sepal")), ~ str_c("YAY_", .))
# YAY_Sepal.Length YAY_Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 setosa
#2 4.9 3.0 1.4 0.2 setosa
#3 4.7 3.2 1.3 0.2 setosa
#4 4.6 3.1 1.5 0.2 setosa
#5 5.0 3.6 1.4 0.2 setosa
#6 5.4 3.9 1.7 0.4 setosa
# ...
How i can shift one raw of data frame to first raw, i want the id raw be the first raw. in R.
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
id A B C D
We can use grepl to create a logical vector based on the 'id' in 'Sepal.Length', then set the column names of the dataset by extracting that row while removing the row from the original dataset
i1 <- grepl("id", df1$Sepal.Length)
setNames(df1[!i1,], unlist(df1[i1,]))
# id A B C D
#1 5.1 3.5 1.4 0.2 setosa
#2 4.9 3.0 1.4 0.2 setosa
#3 4.7 3.2 1.3 0.2 setosa
#4 4.6 3.1 1.5 0.2 setosa
#5 5.0 3.6 1.4 0.2 setosa
You could do the following (assuming your ID is the Nth row):
df <- iris # Example of data.frame
myIdrow <- 5 # as an example id row
df2 <- df[c(myIdrow, (1:nrow(df))[-myIdrow]), ]
Although I would recommend to have the ID as column name.