Reordering rows in a dataframe in r - r

I would like to reorder rows in a dataframe based on a specific order. Here is a dummy dataframe (in the long format) that pretty much looks like my data:
library(ggplot2)
library(dplyr)
#data frame
MV<-c(rnorm(50,mean=10, sd=1),rnorm(50,mean=9, sd=1))
ML<-c(rnorm(50,mean=12, sd=1),rnorm(50,mean=10, sd=1))
NL<-c(rnorm(50,mean=10, sd=1),rnorm(50,mean=8,sd=1))
ID<-rep(1:50,1)
Type<-rep(c("BM","NBM"),times=1, each=50)
df<-data.frame(ID, Type, MV, ML, NL)
#Here is the dataframe:
df.gat<-gather(df, "Tests", "Value", 3:5)
My data is already in the long format to start with (df.gat). The code before that is just to get you a similar dataframe.
Basically, I'd like to have my data ordered in my dataframe in the following order: NL, MV, and ML
I have tried various methods such as the following Reorder rows using custom order or How does one reorder columns in a data frame? which are not very convenient considering the number of rows in my dataset.
The solution also needs to work if some participants didn't do all the tests.
Any solution?

In that case you could just tweak what I suggested above into:
df.gat[rev(order(df.gat$Tests)),]
which happens to do the trick for me here but not necessarily generically.
If you want something generic you could (re-)create (the/a) factor:
df.gat$tests2 <- factor(df.gat$Tests, levels=c(c('NL','MV','ML')))
df.gat[order(df.gat$tests2),]
which should give you the same ordering as above.

If I am understanding you correctly, you simply want to reorder the columns of your dataframe. Why not do something like this:
library(ggplot2)
library(dplyr)
#data frame
MV<-c(rnorm(50,mean=10, sd=1),rnorm(50,mean=9, sd=1))
ML<-c(rnorm(50,mean=12, sd=1),rnorm(50,mean=10, sd=1))
NL<-c(rnorm(50,mean=10, sd=1),rnorm(50,mean=8,sd=1))
ID<-rep(1:50,1)
Type<-rep(c("BM","NBM"),times=1, each=50)
df<-data.frame(ID, Type, MV, ML, NL)
df.gat<-gather(df, "Tests", "Value", 3:5)
df <- df %>% select(NL, MV, ML, everything())

Did you try just placing multiple parameters in an order() call like so?
df[order(MV,ML,NL),]
Your df isn't the best demonstration of this as they're all decimals.
Here is a simpler alternative example using discrete values:
df2 <- data.frame(
C1=sample(rep(c(10,20,30) ,20)),
C2=sample(rep(c('A','B','C'),20)))
df2[order(df2$C1,df2$C2),]
I'm not sure why you'd need dplyr:gather() in your example if you're just working on reordering df, right?

Related

Apply R glm predict function on dataframe by group

At the moment I am trying to apply GLM predict on a dataframe. The dataframe is quite large therefore I want to apply predict by chunks.
I have found a solution but it is quite unhandy. I first create an empty dataframe and then use rbind. Is there a more efficient way of doing this?
df=data[c(),]
for (x in split(data, factor(sort(rank(row.names(data))%%10)))) {
x["prediction"]=predict(model, x, type="response")
df=rbind(df,x)
}
As the comments mention, an example of what you want your output dataframe to look like would be very helpful.
But I think you can achieve what you want by making a grouping variable first then using 'group_by', something like this:
df <- data %>%
mutate(group = rep(1:10, times = nrow(.)/10)) %>% # make an arbitrary grouping factor for this example
group_by(group) %>% # group by whatever your grouping factor is
summarise(predictions = predict(model, x, type = 'response')) # summarise could be replaced by mutate

Generate dplyr arguments using values in another dataframe

I have data where the factor labels have been provided in separate files. As a result, when I read things in I have data that looks like this:
id <- seq(1,10,1)
factor_x <- as.factor(sample(x = 1:7, size = 10, replace = T))
data <- data.frame(id, factor_x)
And a separate data frame containing the labels for factor_x that looks like this:
code <- seq(1,7,1)
label <- letters[1:7]
factor_x_labels <- data.frame(code, label)
factor_x_labels$label <- as.character(factor_x_labels$label)
I am looking for an efficient way to update factor_x in data frame 'data' with the labels in data frame 'factor_x_labels'.
I have been trying to work with fct_recode from the forcats package or recode from dplyr but am running into trouble because (for example) the existing and updated labels need to be pasted as strings but need to separated by = as a symbol.
#Ronak comment is obviously working (and should maybe be an answer) but since this post was tagged dplyr, I'm also posting a dplyr solution:
factor_x_labels$code <- as.character(factor_x_labels$code) #this won't work if one of "code" and "factor_x" is numeric but not the other
data %>%
left_join(factor_x_labels, by=c("factor_x"="code")) %>%
rename(factor_x_label = label)

In R, how can I filter a data frame by data type using dplyr?

I am still learning R and would really appreciate it if someone could show me a simple way to filter a data frame by data type (i.e. only factors) using dplyr so that the output is just a list of variables of the chosen data type?
Thanks in advance!
EDIT:
As it was kindly pointed out, I am missing an example (first post, sorry!). I am trying to do something like the following:
df %>%
filter(typeof(.) == "integer") %>%
names()
The above just returns all of the variables in my data frame, not just those of type integer which is what I would like. I would like to be able to filter for other data types as well, not just integers :)
I would do like this (package agnostic) using base R:
get_names = names(df)[sapply(df, is.factor)]
df = df[,get_names]
In dplyr, you can do:
df <- df %>%
select_if(is.factor)
Just to add to #YOLO's answer, you can have it all in one line like this
df = df[,sapply(df, is.factor, simplify = TRUE)]

Making quick calculations on subsets with R

and thanks to all in advance.
I have the following data:
set.seed(123)
data <- data.frame (name=LETTERS[sample(1:26, 500, replace=T)],present=sample(0:1,500,replace = T))
And I want to quickly calculate the percentage of present observations (1's) for each letter. I can do it manually, but I believe there is an easier way to do this:
library(dplyr)
A <- filter(data, name=="A" & present==1)
A2 <- filter(data, name=="A")
data$Percentage[data$name=="A"] <- nrow(A)/nrow(A2)
And so on until I arrive to "Z".
Can I make this task automatically without having to change the values of the "name" colum manually?
Best regards,
We can use prop.table with table to get the proportion
prop.table(table(data), 1)[,2]
To add it as a column, we can expand it by matching with the 'names'
data$Percentage <- prop.table(table(data), 1)[,2][as.character(data$name)]
Or as #Lars Lau Raket suggested, we don't need to convert to character
prop.table(table(data), 1)[,2][data$name]
If we need to create a column
library(dplyr)
data %>%
group_by(name) %>%
mutate(Percentage = mean(present==1))

how to use gather_ in tidyr with variables

I'm using tidyr together with shiny and hence needs to utilize dynamic values in tidyr operations.
However I do have trouble using the gather_(), which I think was designed for such case.
Minimal example below:
library(tidyr)
df <- data.frame(name=letters[1:5],v1=1:5,v2=10:14,v3=7:11,stringsAsFactors=FALSE)
#works fine
df %>% gather(Measure,Qty,v1:v3)
dyn_1 <- 'Measure'
dyn_2 <- 'Qty'
dyn_err <- 'v1:v3'
dyn_err_1 <- 'v1'
dyn_err_2 <- 'v2'
#error
df %>% gather_(dyn_1,dyn_2,dyn_err)
#error
df %>% gather_(dyn_1,dyn_2,dyn_err_1:dyn_err_2)
after some debug I realized the error happened at melt measure.vars part, but I don't know how to get it work with the ':' there...
Please help with a solution and explain a little bit so I could learn more.
You are telling gather_ to look for the colume 'v1:v3' not on the separate column ids. Simply change dyn_err <- "v1:v3" to dyn_err <- paste("v", seq(3), sep="").
If you df has different column names (e.g. var_a, qtr_b, stg_c), you can either extract those column names or use the paste function for whichever variables are of interest.
dyn_err <- colnames(df)[2:4]
or
dyn_err <- paste(c("var", "qtr", "stg"), letters[1:3], sep="_")
You need to look at what column names you want and make the corresponding vector.

Resources