Sum based on range in separate columns - r

I want to calculate the sum of y along the x-axis. The range for summation is contained in the separate columns xmin and xmax.
df <- data.frame (group = c("A","A","A","A","A","B","B","B","B","B" ),
x = c(1,2,3,4,5,1,2,3,4,5),
y= c(1,2,3,2,1,4,5,6,5,4),
xmin=c(2,2,2,2,2,1,1,1,1,1),
xmax=c(4,4,4,4,4,5,5,5,5,5))
For group A that is a range x from 2 to 4, sum{2+3+2}=7
For group B, range x from 1 to 5 sum{4+5+6+5+4}=24
Is there a way to do it?
I have tried around a bit but I'm not sure if the following goes in the right direction
df %>% rowwise() %>% mutate(sumX=sum(df$y[df$x>=df$min & df$x<=df$max]))

Using between to subset, then just sum in tapply.
subset(df, do.call(data.table::between, c(list(x), list(xmin, xmax)))) |>
with(tapply(y, group, sum))
# A B
# 7 24
Note: R >= 4.1 used.
Data:
df <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B", "B"), x = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5), y = c(1, 2, 3,
2, 1, 4, 5, 6, 5, 4), xmin = c(2, 2, 2, 2, 2, 1, 1, 1, 1, 1),
xmax = c(4, 4, 4, 4, 4, 5, 5, 5, 5, 5)), class = "data.frame", row.names = c(NA,
-10L))

Related

Multistep alluvial diagram in R

I have a data frame in R with 6 categories Pearson1, Spearman1, Kendall1, Pearson2, Spearman2, and Kendall2 and I have 6 variables X1, X2, X3, X4, X5 and X6. In each category I have the ranking of the variables from highest to lowest, for example X1 appear as the least significant in all categories (6 placement).
df <- data.frame(Variable = c("X1", "X2", "X3", "X4", "X5", "X6"),
Pearson1 = c(6, 3, 2, 5, 4, 1),
Spearman1 = c(6, 5, 1, 2, 3, 4),
Kendall1 = c(6, 5, 1, 2, 3, 4),
Pearson2 = c(6, 5, 1, 2, 3, 4),
Spearman2 = c(6, 5, 1, 2, 4, 3),
Kendall2 = c(6, 5, 1, 2, 3, 4))
I want to create an alluvial diagram with the variables that goes from one step to the other. I want in the first column (step) to have the variables and then seeing the ranking it the 6 steps. My final result looks like this but only black and white with different textures for each variable if thats possible.
I have tried the following but it's not working
df_long <- reshape2::melt(df, id.vars = "Variable")
alluvial(df_long, col = "Variable", freq = "value",
group = "Variable", border = "white",
hide = c("Variable"))
Using the first example from the documentation as a code template, and adding a "freq" column to the sample df, makes this chart. No reshaping required.
df <- data.frame(Variable = c("X1", "X2", "X3", "X4", "X5", "X6"),
Pearson1 = c(6, 3, 2, 5, 4, 1),
Spearman1 = c(6, 5, 1, 2, 3, 4),
Kendall1 = c(6, 5, 1, 2, 3, 4),
Pearson2 = c(6, 5, 1, 2, 3, 4),
Spearman2 = c(6, 5, 1, 2, 4, 3),
Kendall2 = c(6, 5, 1, 2, 3, 4))
df$freq<-1
alluvial(df[1:7], freq=df$freq, cex = 0.7)
Reverse vertical order of furst column:
alluvial(df[1:7], freq=df$freq,
cex = 0.7,
ordering = list(
order(df$Variable, decreasing=TRUE),
NULL,
NULL,
NULL,
NULL,
NULL,
NULL
)
)

if else in a loop in R

I want to create a variable region based on a series of similar variables zipid1 to zipid26. My current code is like this:
dat$region <- with(dat, ifelse(zipid1 == 1, 1,
ifelse(zipid2 == 1, 2,
ifelse(zipid3 == 1, 3,
ifelse(zipid4 == 1, 4,
5)))))
How can I write a loop to avoid typing from zipid1 to zipid26? Thanks!
We subset the 'zipid' columns, create a logical matrix by comparing with 1 (== 1), get the column index of the TRUE value with max.col (assuming there is only a single 1 per each row and assign it to create 'region'
dat$region <- max.col(dat[paste0("zipid", 1:26)] == 1, "first")
Using a small reproducible example
max.col(dat[paste0("zipid", 1:5)] == 1, "first")
data
dat <- data.frame(id = 1:5, zipid1 = c(1, 3, 2, 4, 5),
zipid2 = c(2, 1, 3, 5, 4), zipid3 = c(3, 2, 1, 5, 4),
zipid4 = c(4, 3, 6, 2, 1), zipid5 = c(5, 3, 8, 1, 4))

Make boxplots of columns in R

I am a beginner in R, and have a question about making boxplots of columns in R. I just made a dataframe:
SUS <- data.frame(RD = c(4, 3, 4, 1, 2, 2, 4, 2, 4, 1), TK = c(4, 2, 4, 2, 2, 2, 4, 4, 3, 1),
WK = c(3, 2, 4, 1, 3, 3, 4, 2, 4, 2), NW = c(2, 2, 4, 2, NA, NA, 5, 1, 4, 2),
BW = c(3, 2, 4, 1, 4, 1, 4, 1, 5, 1), EK = c(2, 4, 3, 1, 2, 4, 2, 2, 4, 2),
AN = c(3, 2, 4, 2, 3, 3, 3, 2, 4, 2))
rownames(SUS) <- c('Pleasant to use', 'Unnecessary complex', 'Easy to use',
'Need help of a technical person', 'Different functions well integrated','Various function incohorent', 'Imagine that it is easy to learn',
'Difficult to use', 'Confident during use', 'Long duration untill I could work with it')
I tried a number of times, but I did not succeed in making boxplots for all rows. Someone who can help me out here?
You can do it as well using tidyverse
library(tidyverse)
SUS %>%
#create new column and save the row.names in it
mutate(variable = row.names(.)) %>%
#convert your data from wide to long
tidyr::gather("var", "value", 1:7) %>%
#plot it using ggplot2
ggplot(., aes(x = variable, y = value)) +
geom_boxplot()+
theme(axis.text.x = element_text(angle=35,hjust=1))
As #blondeclover says in the comment, boxplot() should work fine for doing a boxplot of each column.
If what you want is a boxplot for each row, then actually your current rows need to be your columns. If you need to do this, you can transpose the data frame before plotting:
SUS.new <- as.data.frame(t(SUS))
boxplot(SUS.new)

how to plot the results of a LDA

There are quite some answers to this question. Not only on stack overflow but through internet. However, none could solve my problem. I have two problems
I try to simulate a data for you
df <- structure(list(Group = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
2, 2, 2), var1 = c(2, 3, 1, 2, 3, 2, 3, 3, 5, 6, 7, 6, 8, 5,
5), var2 = c(9, 9, 9, 8, 7, 8, 9, 3, 2, 2, 1, 1, 2, 3, 3), var3 = c(6,
7, 6, 6, 5, 6, 7, 1, 2, 1, 2, 3, 1, 1, 2)), .Names = c("Group",
"var1", "var2", "var3"), row.names = c(NA, -15L), class = "data.frame")
then I do as follows:
fit <- lda(Group~., data=df)
plot(fit)
I end up with groups appearing in two different plots.
how to plot my results in one figure like e.g. Linear discriminant analysis plot
Linear discriminant analysis plot using ggplot2
or any other beautiful plot ?
The plot() function actually calls plot.lda(), the source code of which you can check by running getAnywhere("plot.lda"). This plot() function does quiet a lot of processing of the LDA object that you pass in before plotting. As a result, if you want to customize how your plots look, you will probably have to write your own function that extracts information from the lda object and then passes it to a plot fuction. Here is an example (I don't know much about LDA, so I just trimmed the source code of the default plot.lda and use ggplot2 package (very flexible) to create a bunch of plots).
#If you don't have ggplot2 package, here is the code to install it and load it
install.packages("ggplot2")
library("ggplot2")
library("MASS")
#this is your code. The only thing I've changed here is the Group labels because you want a character vector instead of numeric labels
df <- structure(list(Group = c("a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b"),
var1 = c(2, 3, 1, 2, 3, 2, 3, 3, 5, 6, 7, 6, 8, 5, 5),
var2 = c(9, 9, 9, 8, 7, 8, 9, 3, 2, 2, 1, 1, 2, 3, 3),
var3 = c(6, 7, 6, 6, 5, 6, 7, 1, 2, 1, 2, 3, 1, 1, 2)),
.Names = c("Group","var1", "var2", "var3"),
row.names = c(NA, -15L), class = "data.frame")
fit <- lda(Group~., data=df)
#here is the custom function I made that extracts the proper information from the LDA object. You might want to write your own version of this to make sure it works with all cases (all I did here was trim the original plot.lda() function, but I might've deleted some code that might be relevant for other examples)
ggplotLDAPrep <- function(x){
if (!is.null(Terms <- x$terms)) {
data <- model.frame(x)
X <- model.matrix(delete.response(Terms), data)
g <- model.response(data)
xint <- match("(Intercept)", colnames(X), nomatch = 0L)
if (xint > 0L)
X <- X[, -xint, drop = FALSE]
}
means <- colMeans(x$means)
X <- scale(X, center = means, scale = FALSE) %*% x$scaling
rtrn <- as.data.frame(cbind(X,labels=as.character(g)))
rtrn <- data.frame(X,labels=as.character(g))
return(rtrn)
}
fitGraph <- ggplotLDAPrep(fit)
#Here are some examples of using ggplot to display your results. If you like what you see, I suggest to learn more about ggplot2 and then you can easily customize your plots
#this is similar to the result you get when you ran plot(fit)
ggplot(fitGraph, aes(LD1))+geom_histogram()+facet_wrap(~labels, ncol=1)
#Same as previous, but all the groups are on the same graph
ggplot(fitGraph, aes(LD1,fill=labels))+geom_histogram()
The following example won't work with your example because you don't have LD2, but this is equivalent to the scatter plot in the external example you provided. I've loaded that example here as a demo
ldaobject <- lda(Species~., data=iris)
fitGraph <- ggplotLDAPrep(ldaobject)
ggplot(fitGraph, aes(LD1,LD2, color=labels))+geom_point()
I didn't customize ggplot settings much, but you can make your graphs look like anything you want if you play around with it.Hope this helps!

Data frame into list

I want to convert my data frame into a list
g<- c( 1, 1, 1, 2, 2, 2, 3, 3, 4, 4)
d<- c (10, 20, 10,10,52,45,45,50,65,58)
mydata <- data.frame (cbind (g, d), colnames = c("g", "d"))
I tried split() but it didn't work!

Resources