Related
Let's cosnider very easy dataframe containing four groups:
cat <- c(1, 0, 0, 1, 2, 1, 2, 3, 2, 1, 3)
var <- c(10, 5, 3, 2, 5, 1, 2, 10, 50, 2, 30)
df <- data.frame(cat, var)
What I would like to do is that using dplyr plot distribution of values between those four categories
I have the feeling that it can be eaisly done with group_by, but I'm not sure how it can be done. Do you know how I can do it?
I want to calculate the sum of y along the x-axis. The range for summation is contained in the separate columns xmin and xmax.
df <- data.frame (group = c("A","A","A","A","A","B","B","B","B","B" ),
x = c(1,2,3,4,5,1,2,3,4,5),
y= c(1,2,3,2,1,4,5,6,5,4),
xmin=c(2,2,2,2,2,1,1,1,1,1),
xmax=c(4,4,4,4,4,5,5,5,5,5))
For group A that is a range x from 2 to 4, sum{2+3+2}=7
For group B, range x from 1 to 5 sum{4+5+6+5+4}=24
Is there a way to do it?
I have tried around a bit but I'm not sure if the following goes in the right direction
df %>% rowwise() %>% mutate(sumX=sum(df$y[df$x>=df$min & df$x<=df$max]))
Using between to subset, then just sum in tapply.
subset(df, do.call(data.table::between, c(list(x), list(xmin, xmax)))) |>
with(tapply(y, group, sum))
# A B
# 7 24
Note: R >= 4.1 used.
Data:
df <- structure(list(group = c("A", "A", "A", "A", "A", "B", "B", "B",
"B", "B"), x = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 5), y = c(1, 2, 3,
2, 1, 4, 5, 6, 5, 4), xmin = c(2, 2, 2, 2, 2, 1, 1, 1, 1, 1),
xmax = c(4, 4, 4, 4, 4, 5, 5, 5, 5, 5)), class = "data.frame", row.names = c(NA,
-10L))
I want to create a variable region based on a series of similar variables zipid1 to zipid26. My current code is like this:
dat$region <- with(dat, ifelse(zipid1 == 1, 1,
ifelse(zipid2 == 1, 2,
ifelse(zipid3 == 1, 3,
ifelse(zipid4 == 1, 4,
5)))))
How can I write a loop to avoid typing from zipid1 to zipid26? Thanks!
We subset the 'zipid' columns, create a logical matrix by comparing with 1 (== 1), get the column index of the TRUE value with max.col (assuming there is only a single 1 per each row and assign it to create 'region'
dat$region <- max.col(dat[paste0("zipid", 1:26)] == 1, "first")
Using a small reproducible example
max.col(dat[paste0("zipid", 1:5)] == 1, "first")
data
dat <- data.frame(id = 1:5, zipid1 = c(1, 3, 2, 4, 5),
zipid2 = c(2, 1, 3, 5, 4), zipid3 = c(3, 2, 1, 5, 4),
zipid4 = c(4, 3, 6, 2, 1), zipid5 = c(5, 3, 8, 1, 4))
There are quite some answers to this question. Not only on stack overflow but through internet. However, none could solve my problem. I have two problems
I try to simulate a data for you
df <- structure(list(Group = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
2, 2, 2), var1 = c(2, 3, 1, 2, 3, 2, 3, 3, 5, 6, 7, 6, 8, 5,
5), var2 = c(9, 9, 9, 8, 7, 8, 9, 3, 2, 2, 1, 1, 2, 3, 3), var3 = c(6,
7, 6, 6, 5, 6, 7, 1, 2, 1, 2, 3, 1, 1, 2)), .Names = c("Group",
"var1", "var2", "var3"), row.names = c(NA, -15L), class = "data.frame")
then I do as follows:
fit <- lda(Group~., data=df)
plot(fit)
I end up with groups appearing in two different plots.
how to plot my results in one figure like e.g. Linear discriminant analysis plot
Linear discriminant analysis plot using ggplot2
or any other beautiful plot ?
The plot() function actually calls plot.lda(), the source code of which you can check by running getAnywhere("plot.lda"). This plot() function does quiet a lot of processing of the LDA object that you pass in before plotting. As a result, if you want to customize how your plots look, you will probably have to write your own function that extracts information from the lda object and then passes it to a plot fuction. Here is an example (I don't know much about LDA, so I just trimmed the source code of the default plot.lda and use ggplot2 package (very flexible) to create a bunch of plots).
#If you don't have ggplot2 package, here is the code to install it and load it
install.packages("ggplot2")
library("ggplot2")
library("MASS")
#this is your code. The only thing I've changed here is the Group labels because you want a character vector instead of numeric labels
df <- structure(list(Group = c("a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b"),
var1 = c(2, 3, 1, 2, 3, 2, 3, 3, 5, 6, 7, 6, 8, 5, 5),
var2 = c(9, 9, 9, 8, 7, 8, 9, 3, 2, 2, 1, 1, 2, 3, 3),
var3 = c(6, 7, 6, 6, 5, 6, 7, 1, 2, 1, 2, 3, 1, 1, 2)),
.Names = c("Group","var1", "var2", "var3"),
row.names = c(NA, -15L), class = "data.frame")
fit <- lda(Group~., data=df)
#here is the custom function I made that extracts the proper information from the LDA object. You might want to write your own version of this to make sure it works with all cases (all I did here was trim the original plot.lda() function, but I might've deleted some code that might be relevant for other examples)
ggplotLDAPrep <- function(x){
if (!is.null(Terms <- x$terms)) {
data <- model.frame(x)
X <- model.matrix(delete.response(Terms), data)
g <- model.response(data)
xint <- match("(Intercept)", colnames(X), nomatch = 0L)
if (xint > 0L)
X <- X[, -xint, drop = FALSE]
}
means <- colMeans(x$means)
X <- scale(X, center = means, scale = FALSE) %*% x$scaling
rtrn <- as.data.frame(cbind(X,labels=as.character(g)))
rtrn <- data.frame(X,labels=as.character(g))
return(rtrn)
}
fitGraph <- ggplotLDAPrep(fit)
#Here are some examples of using ggplot to display your results. If you like what you see, I suggest to learn more about ggplot2 and then you can easily customize your plots
#this is similar to the result you get when you ran plot(fit)
ggplot(fitGraph, aes(LD1))+geom_histogram()+facet_wrap(~labels, ncol=1)
#Same as previous, but all the groups are on the same graph
ggplot(fitGraph, aes(LD1,fill=labels))+geom_histogram()
The following example won't work with your example because you don't have LD2, but this is equivalent to the scatter plot in the external example you provided. I've loaded that example here as a demo
ldaobject <- lda(Species~., data=iris)
fitGraph <- ggplotLDAPrep(ldaobject)
ggplot(fitGraph, aes(LD1,LD2, color=labels))+geom_point()
I didn't customize ggplot settings much, but you can make your graphs look like anything you want if you play around with it.Hope this helps!
I want to extract pairs of data from a data frame, where they are paired with data that is not in their own column. Each number in column 1 is paired with all the numbers to the right of that column. Likewise numbers in column 2 are only paired with numbers in columns 3 or above.
I have created a script that does it using a bird's nest of 'for' loops but I feel there should be a more elegant way to do it.
Example data:
structure(list(A = 1:3, B = 4:6, C = 7:9), .Names = c("A", "B",
"C"), class = "data.frame", row.names = c(NA, -3L))
Desired output:
structure(list(X1 = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3,
3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6), X2 = c(4, 5, 6, 7,
8, 9, 4, 5, 6, 7, 8, 9, 4, 5, 6, 7, 8, 9, 7, 8, 9, 7, 8, 9, 7,
8, 9)), .Names = c("X1", "X2"), row.names = c(NA, 27L), class = "data.frame")
Here's an approach using data.table package and its very efficient CJ and rbindlist functions (assuming your data set called df)
library(data.table)
res <- rbindlist(lapply(seq_len(length(df) - 1),
function(i) CJ(df[, i], unlist(df[, -(seq_len(i))]))))
You could then set your column names by reference (if you insist on "X1" and "X2") using setnames
setnames(res, 1:2, c("X1", "X2"))
You can also convert back to data.frame by reference (if you want to match your desired output "exactly") by using setDF()
setDF(res)
Here df is the input dataset
out1 <- do.call(rbind,lapply(1:(ncol(df)-1), function(i) {
x1 <- df[,i:(ncol(df))]
Un1 <-unique(unlist(x1[,-1]))
data.frame(X1=rep(x1[,1], each=length(Un1)), X2= Un1)}))
all.equal(out, out1) #if `out` is the expected output
#[1] TRUE
Another approach:
res <- do.call(rbind, unlist(lapply(seq(ncol(dat) - 1), function(x)
lapply(seq(x + 1, ncol(dat)), function(y)
"names<-"(expand.grid(dat[c(x, y)]), c("X1", "X2")))),
recursive = FALSE))
where dat is the name of your data frame.
You can sort the result with this command:
res[order(res[[1]], res[[2]]), ]