display vector in decreasing order using another vector R - r

I have two vectors:
qing.emperors = c("Shunzhi","Kangxi","Yongzheng","Qianlong","Jiaqing")
reign.length = c(18,61,13,60,25)
I want to display the names of the emperors in order of decreasing reign length.
I'm supposed to use the order function, and I'm having trouble using this to get to a vector with strings as a result. Any ideas? Thanks!

another way with pipes:
library(dplyr)
data <-data.frame(qing.emperors, reign.length)
ordered <- data %>% arrange(desc(reign.length)) %>% select(qing.emperors, reign.length)

Related

How to use str_extract_all in R in data frame without returning a list?

I have been trying to extract multiple patterns in a sequence in a data frame of each row and returning those patterns in a new column. But the problem is i get a list if i use str_extract_all and i don't how to unlist.
I have been trying to use the code at the bottom. The unnest does not work either neither does unlist in mutate function.
dc <- z %>%
mutate(sequence_match = str_extract_all(z$Sequence,
c("R..S", "R..T", "R..Y")))
You can return one comma-separated string of values.
library(dplyr)
library(stringr)
dc = z %>%
mutate(sequence_match = sapply(str_extract_all(Sequence,
c("R..S", "R..T", "R..Y"), toString)))

How do I use loop to extract specific headers in dataframe? (using R Studio)

So the data goes like this.
SalesData is a data.frame.
SalesData$Day1 is numeric column including sales data of each member on first day. Same goes for $Day2, $Day3 ... $Day50
I'm trying to make a new column with 50 numeric data, sum of each day's sale, by using loop.
I tried
for (i in 1:50)
{SalesData$Dailysum[i] <- sum(SalesData$get(colnames(SalesData)[i]))
Error: attempt to apply non-function
apparently I can't use get(colnames(SalesData)[i] to extract specific header.
SalesData$colnames(SalesData)[i]
didn't work either.
Is there any way to use loop to extract header, and use it for loop?
We can just do instead of get as it works with either numeric index or column names
for(i in 1:50)
SalesData$Dailysum[i] <- sum(SalesData[[i]], na.rm = TRUE)
In tidyverse, we could also do with summarise while checking if the column type is numeric or not
library(dplyr)
SalesData %>%
summarise(across(where(is.numeric), sum, na.rm = TRUE))

R: Scale a subset of multiple columns (with similar names) with dplyr

I recently moved from common dataframe manipulation in R to the tidyverse. But I got a problem regarding scaling of columns with the scale()function.
My data consists of columns of whom some are numerical and some categorical features. Also the last column is the y value of data. So I want to scale all numerical columns but not the last column.
With the select()function i am able to write a very short line of code and select all my numerical columns that need to be scaled if i add the ends_with("...") argument. But I can't really make use of that with scaling. There I have to use transmute(feature1=scale(feature1),feature2=scale(feature2)...)and name each feature individually. This works fine but bloats up the code.
So my question is:
Is there a smart solution to manipulate column by column without the need to address every single column name with
transmute?
I imagine something like:
transmute(ends_with("...")=scale(ends_with("..."),featureX,featureZ)
(well aware that this does not work)
Many thanks in advance
library(tidyverse)
data("economics")
# add variables that are not numeric
economics[7:9] <- sample(LETTERS[1:10], size = dim(economics)[1], replace = TRUE)
# add a 'y' column (for illustration)
set.seed(1)
economics$y <- rnorm(n = dim(economics)[1])
economics_modified <- economics %>%
select(-y) %>%
transmute_if(is.numeric, scale) %>%
add_column(y = economics$y)
If you want to keep those columns that are not numeric replace transmute_if with modify_if. (There might be a smarter way to exclude column y from being scaled.)

Using starts_with in dplyr with a vector of partial column names

I would like to use dplyr to select certain columns that match to a string vector.
one <- seq(1:10)
two <- rnorm(10)
three <- runif(10, 1, 2)
four <- -10:-1
df <- data.frame(one, two, three, four)
vars <- c('on', 'thr')
I want to select only the columns in df whose titles start with'on' or 'thr':
dplyr::select_(df, starts_with(vars))
However, the above is not working.
The various selection helper functions in dplyr are meant to take only a single character string for matching. You can get around this by combining your strings into one regular expression and using matches:
vars <- paste0("^(", paste(vars, collapse="|"), ")")
select(df, matches(vars))
Presumably you know in advance, because you're coding it in, what column name matches you want, so you could use
select(starts_with("on"), starts_with("thr"))
Ah, I see Tony Ladson essentiall suggested this already. Depending on your exact use case, though, I don't see a need to get them from a vector.
Here is a solution using starts_with:
df %>%
select(map(c('on', 'thr'),
starts_with,
vars = colnames(.)) %>%
unlist())
Basically, the idea is to apply the starts_with function to the vector of names by using map.
But to get it to work, one must add the argument vars(the list of colnames), and then unlist the result of map to get the vector of positions.
This solution expands the one of Chrisss to the case where there are several matches for at least one entry.

How to create a "top ten" vector that keeps labels?

I have a data set that has 655 Rows, and 21 Columns. I'm currently looping through each column and need to find the top ten of each, but when I use the head() function, it doesn't keep the labels (they are names of bacteria, each column is a sample). Is there a way to create sorted subset of data that sorts the row name along with it?
right now I am doing
topten <- head(sort(genuscounts[,c(1,i)], decreasing = TRUE) n = 10)
but I am getting an error message since column 1 is the list of names.
Thanks!
Because sort() applies to vectors, it's not going to work with your subset genuscounts[,c(1,i)], because the subset has multiple columns. In base R, you'll want to use order():
thisColumn <- genuscounts[,c(1,i)]
topten <- head(thisColumn[order(thisColumn[,2],decreasing=T),],10)
You could also use arrange_() from the dplyr package, which provides a more user-friendly interface:
library(dplyr)
head(arrange_(genuscounts[,c(1,i)],desc(names(genuscounts)[i])),10)
You'd need to use arrange_() instead of arrange() because your column name will be a string and not an object.
Hope this helps!!

Resources