How i can shift one row of data frame to first row? - r

How i can shift one raw of data frame to first raw, i want the id raw be the first raw. in R.
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
id A B C D

We can use grepl to create a logical vector based on the 'id' in 'Sepal.Length', then set the column names of the dataset by extracting that row while removing the row from the original dataset
i1 <- grepl("id", df1$Sepal.Length)
setNames(df1[!i1,], unlist(df1[i1,]))
# id A B C D
#1 5.1 3.5 1.4 0.2 setosa
#2 4.9 3.0 1.4 0.2 setosa
#3 4.7 3.2 1.3 0.2 setosa
#4 4.6 3.1 1.5 0.2 setosa
#5 5.0 3.6 1.4 0.2 setosa

You could do the following (assuming your ID is the Nth row):
df <- iris # Example of data.frame
myIdrow <- 5 # as an example id row
df2 <- df[c(myIdrow, (1:nrow(df))[-myIdrow]), ]
Although I would recommend to have the ID as column name.

Related

subseting a dataframe in R

I have a dataframe and I want to Create a subset,< Frame>, of just the species variable and display the first five records. with R how can I subset?
there are 10 rows and 7 columns.one column is Species
netID- fishID - species- tl - wtag - scale
By select.
head(
select(dataframe, speceis)
)
Assuming your dataframe is called df you can subset with dplyr
library(dplyr)
df <- iris[1:10,]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
newdf<-df %>% select(Species) %>%slice(1:5)
Here you are selecting species from your data frame and then using slice you can select the range of rows you need. The Output of newdf is
Species
1 setosa
2 setosa
3 setosa
4 setosa
5 setosa

Renaming columns based on condition about their names

I would like to add a prefix to my dataset column names only if they already begin with a certain string, and I would like to do it (if possible) using a dplyr pipeline.
Taking the iris dataset as toy example, I was able to get the expected result with base R (with a quite cumbersome line of code):
data("iris")
colnames(iris)[startsWith(colnames(iris), "Sepal")] <- paste0("YAY_", colnames(iris)[startsWith(colnames(iris), "Sepal")])
head(iris)
YAY_Sepal.Length YAY_Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
In this example, the prefix YAY_ has been added to all the column names starting with Sepal. Is there a way to obtain the same result with a dplyr command/pipeline?
An option would be rename_at
library(tidyverse)
iris %>%
rename_at(vars(starts_with("Sepal")), ~ str_c("YAY_", .))
# YAY_Sepal.Length YAY_Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 setosa
#2 4.9 3.0 1.4 0.2 setosa
#3 4.7 3.2 1.3 0.2 setosa
#4 4.6 3.1 1.5 0.2 setosa
#5 5.0 3.6 1.4 0.2 setosa
#6 5.4 3.9 1.7 0.4 setosa
# ...

How can I specify the name in a name-value pair inside a for loop?

I have this example script:
library(tidyverse)
placeholder <- c("foo", "bar", "baz", "bash")
x <- 1
for (name in placeholder) {
iris <- iris %>% add_column(name = rep("example", nrow(iris)),
.after = x)
x <- x + 2
}
What I would like to do, is to insert a column after each of the flower characteristics, with columns named from the elements in placeholder.
Expected output:
Sepal.Length foo Sepal.Width bar Petal.Length baz Petal.Width bash Species
1 5.1 example 3.5 example 1.4 example 0.2 example setosa
2 4.9 example 3.0 example 1.4 example 0.2 example setosa
3 4.7 example 3.2 example 1.3 example 0.2 example setosa
4 4.6 example 3.1 example 1.5 example 0.2 example setosa
5 5.0 example 3.6 example 1.4 example 0.2 example setosa
6 5.4 example 3.9 example 1.7 example 0.4 example setosa
However my for loop labels the first column in the for loop name, then terminates with an error saying Error: Column name already exists.
What I get:
Sepal.Length name Sepal.Width Petal.Length Petal.Width Species
1 5.1 example 3.5 1.4 0.2 setosa
2 4.9 example 3.0 1.4 0.2 setosa
3 4.7 example 3.2 1.3 0.2 setosa
4 4.6 example 3.1 1.5 0.2 setosa
5 5.0 example 3.6 1.4 0.2 setosa
6 5.4 example 3.9 1.7 0.4 setosa
We can use the assignment operator (:=) with name evaluated (!!) on the lhs for setting the values of placeholder as the column names
x <- 1
for (name in placeholder) {
iris <- iris %>% add_column(!! name := rep("example", nrow(iris)),
.after = x)
x <- x + 2
}
head(iris)
# Sepal.Length foo Sepal.Width bar Petal.Length baz Petal.Width bash Species
#1 5.1 example 3.5 example 1.4 example 0.2 example setosa
#2 4.9 example 3.0 example 1.4 example 0.2 example setosa
#3 4.7 example 3.2 example 1.3 example 0.2 example setosa
#4 4.6 example 3.1 example 1.5 example 0.2 example setosa
#5 5.0 example 3.6 example 1.4 example 0.2 example setosa
#6 5.4 example 3.9 example 1.7 example 0.4 example setosa

How to select alternatively 12 rows from data frame in R [duplicate]

This question already has an answer here:
Selecting multiple parts of a list
(1 answer)
Closed 5 years ago.
Suppose I have a data frame containing 192 rows and I want to select 12 rows alternatively.
i.e. select first 12 rows, then select 25 to 36 rows, then select 49 to 60 rows.
How to do that in R?
Using the iris data as an example.
Simply use iris[1:12,] for the first 12 rows:
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 setosa
#2 4.9 3.0 1.4 0.2 setosa
#3 4.7 3.2 1.3 0.2 setosa
#4 4.6 3.1 1.5 0.2 setosa
#5 5.0 3.6 1.4 0.2 setosa
#6 5.4 3.9 1.7 0.4 setosa
#7 4.6 3.4 1.4 0.3 setosa
#8 5.0 3.4 1.5 0.2 setosa
#9 4.4 2.9 1.4 0.2 setosa
#10 4.9 3.1 1.5 0.1 setosa
#11 5.4 3.7 1.5 0.2 setosa
#12 4.8 3.4 1.6 0.2 setosa
iris[25:36,] for rows 25 to 36, and so on.
Note that iris will be swapped to the name of your data frame. The comma is used to select either rows or columns. Thus, iris[,1:3] would select the first 3 columns of the data frame.
You could do this vectorized using recycling technique in R (df is your data frame):
df[rep(c(TRUE, FALSE), each = 12),]

Subsetting observations based on a duplicate values

How do I retain just one observation in my dataset when the dataset contains two columns with duplicate values? For example if this is my dataset below:
row1 & row 2
col(Sepal.Length) and col(Petal.Length)
contain similar values (5.1, 1.4), (5.1, 1.4)
I want to remove the second row and just retain the first row.
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 5.1 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 5.0 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
Reproducible test data:
test12 <- head(iris)
test12[2,1] <- 5.1
Thanks in advance.
Use duplicated to compare those specific columns:
test12[!duplicated(test12[,c(1,3)]),]
## or referencing the column names themselves:
test12[!duplicated(test12[,c("Sepal.Length","Petal.Length")]),]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1 5.1 3.5 1.4 0.2 setosa
#3 4.7 3.2 1.3 0.2 setosa
#4 4.6 3.1 1.5 0.2 setosa
#5 5.0 3.6 5.0 0.2 setosa
#6 5.4 3.9 1.7 0.4 setosa
To keep only the first row:
row1 <- test12[1, ]
To drop the second row of your dataFrame:
dropRow <- test12[-2, ]

Resources