╔═════╦═════════╦═════╗
║ id ║ seconds ║ ... ║
╠═════╬═════════╬═════╣
║ A ║ 30 ║ ... ║
║ B ║ 20 ║ ... ║
║ ... ║ ... ║ ... ║
║ All ║ 10 ║ ... ║
╚═════╩═════════╩═════╝
I have data where "id" can be "All", which means that it impacts all other ids (it is NOT a "Total").
I need to do charts, and I want the chart to sum the value of id "All" to all others ids, instead of creating a bar to "All" parameter.
I want to do a chart, reading the table like this:
╔═════╦═════════╦═════╗
║ id ║ seconds ║ ... ║
╠═════╬═════════╬═════╣
║ A ║ 40 ║ ... ║
║ B ║ 30 ║ ... ║
║ ... ║ ... ║ ... ║
╚═════╩═════════╩═════╝
Is this possible in BIRT?
Yes, this can be done in many ways.
For the choice of solutions you have to give more information.
Is there only one id 'all'?
Are the other ID's unique?
If so, I would check in the fetch script if the ID value is 'all' and if so, put the value of seconds in a global variable.
You can then use this variable to add it to all the seconds in the table.
Use the table filter or row visibility to hide the 'all' ID.
You can use tables, crosstabs or datasets as a datasource if you want to display this information as a graph.
Related
I apologize in advance since I am a beginner user in R.
I have a big data file, with multiple factors (15), and multiple tested samples from various group within each factor (5). I have calculated the means for each group within each factor. To simplify the presentation of my data, I would like to create a circular plot to present this information. I came upon the package 'circilize' and the 'circos.trackHist()' is a perfect choice for my purposes. Unfortunately, the guide I am looking at does not provide an example of how to use imported data, but rather creates simulated data from scratch. In addition, it is rather complex for my level and I would appreciate any support with graphing it. If I have the following data in tabular form in excel, how could I create a circular plot?
║ Factor ║ group ║ average ║
║ Factor1 ║ A ║ 77.53 ║
║ Factor1 ║ B ║ 54.98 ║
║ Factor1 ║ B ║ 43.35 ║
║ Factor1 ║ C ║ 243.0 ║
║ Factor2 ║ A ║ 91.3 ║
║ Factor2 ║ A ║ 70.2 ║
║ Factor2 ║ A ║ 67.93 ║
║ Factor3 ║ C ║ 16.49 ║
║ Factor3 ║ B ║ 0 ║
║ Factor3 ║ C ║ 5.1416 ║
Seems like you need to create a matrix. Use the factors as your rows and groups as your columns and write a loop to load the values into the matrix. Might have to do it in data frame and convert to matrix, but it shouldn't be too hard.
I have two dataset A and B that contains values and coordonates
A:
╔═══╦════════════╦═════════════╦═════════════╗
║ ║ name ║ x ║ y ║
╠═══╬════════════╬═════════════╬═════════════╣
║ 1 ║ city ║ 50.3 ║ 4.2 ║
║ 2 ║ farm ║ 14.8 ║ 8.6 ║
║ 3 ║ lake ║ 18.7 ║ 9.8 ║
║ 3 ║ Mountain ║ 44 ║ 9.8 ║
╚═══╩════════════╩═════════════╩═════════════╝
B:
╔═══╦════════════╦═════════════╦═════════════╗
║ ║ Temp ║ x ║ y ║
╠═══╬════════════╬═════════════╬═════════════╣
║ 1 ║ 18 ║ 50.7 ║ 6.2 ║
║ 2 ║ 17,3 ║ 20 ║ 11 ║
║ 3 ║ 15 ║ 15 ║ 9 ║
╚═══╩════════════╩═════════════╩═════════════╝
I would like this, C:
╔═══╦════════════╦═════════════╦═════════════╗
║ ║ Name ║ Temp ║ Distance ║
╠═══╬════════════╬═════════════╬═════════════╣
║ 1 ║ city ║ 18 ║ 2.039608 ║
║ 2 ║ farm ║ 15 ║ 0.447214 ║
║ 3 ║ lake ║ 17.3 ║ 1.769181 ║
║ 4 ║ Mountain ║ 18 ║ 7.605919 ║
╚═══╩════════════╩═════════════╩═════════════╝
I tried this :
A<- read.table(header = TRUE, text = "
Name x y
city 50.3 4.2
farm 14.8 8.6
lake 18.7 9.8
mountain 44 9.8")
B<- read.table(header = TRUE, text = "
Temp x y
18 50.7 6.2
17.3 20 11
15 15 9")
C<- data.frame(Name=character(),
Temp=numeric(),
Distance=numeric())
for(i in 1:nrow(A)) {
x1<- A[i,]$x
y1<- A[i,]$y
min = 100
index = 0
for(j in 1:nrow(B)) {
x2<- B[j,]$x
y2<- B[j,]$y
tmp = sqrt((((x2-x1)^2)+((y2-y1)^2)))
if (tmp < min) {
index = j
min = tmp
}
}
df <- list(Name=A[i,]$Name, Temp=B[index,]$Temp, Distance=min)
C <- rbind(C, df)
}
print(C)
But my first dataset is about 1,500,000 rows and my second one is about 5000 and this algorythm is very very slow. Is there a better way to do it ?
If you want a hack in R, you can use R's outer-function (and the awareness that R is good at vectorization) to efficiently produce the distances of all in A[, c(x,y)] from all in B[, c(x,y)], that is, obtaining a Matrix of distances of the locations in A (row) from each of the locations in B (columns) e.g.,
A<- read.table(header = TRUE, text = "
Name x y
city 50.3 4.2
farm 14.8 8.6
lake 18.7 9.8
mountain 44 9.8")
B<- read.table(header = TRUE, text = "
Temp x y
18 50.7 6.2
17.3 20 11
15 15 9
18 ")
d <- sqrt(outer(A$x, B$x, "-")^2 + outer(A$y, B$y, "-")^2)
d
## [,1] [,2] [,3]
## [1,] 2.039608 31.053663 35.6248509
## [2,] 35.980133 5.727128 0.4472136
## [3,] 32.201863 1.769181 3.7854986
## [4,] 7.605919 24.029981 29.0110324
Next you can efficiently obtain its value via the rowMins-method in matrixStats package
minD <- matrixStats::rowMins(d)
And assuming there is a unique closest location in B obtain its index via (row-wise) comparison of d to minD
ind <- (d == minD) %*% 1:ncol(d)
If there are multiple equaly distanced locations in B you'll anyways need some kind of rule as to which to choose.
Last, just stack the data together.
C <- data.frame(Name = A$Name,
Temp = B$Temp[ind],
Distance = minD)
Input data frame can have varied number of columns.
Output data frame should have only a single column which is a concatenation of all available columns in input data frame.
Example: Input
╔══════╦══════╗
║ a ║ b ║
╠══════╬══════╣
║blue ║ 5636 ║
║red ║ 148 ║
║yellow║ 101 ║
║green ║ 959 ║
╚══════╩══════╝
Desired Output
╔═══════════╗
║ a-b ║
╠═══════════╣
║blue-5636 ║
║red-148 ║
║yellow-101 ║
║green-959 ║
╚═══════════╝
This example has 2 columns, that input data frame can have any number of columns,so the solution should not require column names.
I tried using transform,but that requires to specify the columns.
outputDF=transform(inputDF,xyz=paste0(inputDF[,1],'-',inputDF[,2]))
Is there a way where I can collapse all input columns into a single column separated by a '-'
We can use do.call
v1 <- do.call(paste, c(inputDF, list(sep='-')))
v2 <- paste(names(inputDF), collapse='-')
setNames(data.frame(v1),v2)
# a-b
#1 blue-5636
#2 red-148
#3 yellow-101
#4 green-959
I'm trying to get better with dplyr and tidyr but I'm not used to "thinking in R". An example may be best. The table I've generated from my data in sql looks like this:
╔═══════════╦════════════╦═════╦════════╦══════════════╦══════════╦══════════════╗
║ patientid ║ had_stroke ║ age ║ gender ║ hypertension ║ diabetes ║ estrogen HRT ║
╠═══════════╬════════════╬═════╬════════╬══════════════╬══════════╬══════════════╣
║ 934988 ║ 1 ║ 65 ║ M ║ 1 ║ 1 ║ 0 ║
║ 94044 ║ 0 ║ 69 ║ F ║ 1 ║ 0 ║ 0 ║
║ 689348 ║ 0 ║ 56 ║ F ║ 0 ║ 1 ║ 1 ║
║ 902498 ║ 1 ║ 45 ║ M ║ 0 ║ 0 ║ 1 ║
║ … ║ ║ ║ ║ ║ ║ ║
╚═══════════╩════════════╩═════╩════════╩══════════════╩══════════╩══════════════╝
I would like to create an output table that conveys the following information:
╔══════════════╦════════╦══════════╦══════════╦══════════╦═══════════╗
║ ║ total ║M lt50 yo ║F lt50 yo ║M gte50yo ║F gte 50yo ║
╠══════════════╬════════╬══════════╬══════════╬══════════╬═══════════╣
║ estrogen HRT ║ 347 ║ 2 ║ 65 ║ 4 ║ 97 ║
║ diabetes ║ 13922 ║ 54 ║ 73 ║ 192 ║ 247 ║
║ hypertension ║ 8210 ║ 102 ║ 187 ║ 443 ║ 574 ║
╚══════════════╩════════╩══════════╩══════════╩══════════╩═══════════╝
Total is the total number of patients with that comorbidity (easy enough: sum(data$estrogen == 1) etc). The other cells are now the number of patients with that comorbidity in that age and gender stratification where had_stroke==1.
I'd love to just get a general idea of how to approach problems like this as it seems like a pretty fundamental way to transform data. If the total column makes it funky then feel free to exclude that.
Try to do simpler.
I assume that you have a data.frame called data. These is a toy data set.
set.seed(0)
data <- data.frame(estrogen = runif(100) < .10,
diabetes = runif(100) < .15,
hypertension = runif(100) < .20,
groups = cut(runif(100), c(0,.1,.4,.7,1), labels = c("my", "fy", "mo", "fo")))
Add new var to data frame for groups.
Then, use table() to get summaries
res <- rbind(
table(data$estrogen, data$groups)[2,],
table(data$diabetes, data$groups)[2,],
table(data$hypertension, data$groups)[2,]
)
res <- cbind(apply(res, 1, sum), res)
Finaly, use colnames(res) y rownames(res) to set appropriate names to columns and rows.
colnames(res)[1] <- "Total"
rownames(res) <- c("estrogen", "diabetes", "hypertension")
Results
Total my fy mo fo
estrogen 12 2 2 4 4
diabetes 28 1 8 11 8
hypertension 27 1 10 11 5
So here is a data.table solution.
# create MRE - you have this already
n <- 1000
set.seed(1) # for reproducible example
df <- data.frame(ID=sample(1:n,n),had_stroke=sample(0:1,n,replace=TRUE),
age=sample(25:85,n,replace=TRUE), gender=sample(c("M","F"),n,replace=TRUE),
hypertension=sample(0:1,n,replace=TRUE),
diabetes=sample(0:1,n,replace=TRUE),
estrogen=sample(0:1,n,replace=TRUE))
# you start here.
library(data.table)
result <- melt(setDT(df),measure=5:7, variable.name="comorbidity")
result[,list(total=sum(value==1),
M.lt.50=sum(value[gender=="M"&age< 50]),
F.lt.50=sum(value[gender=="F"&age< 50]),
M.ge.50=sum(value[gender=="M"&age>=50]),
F.ge.50=sum(value[gender=="F"&age>=50])),
by=comorbidity]
# comorbidity total M.lt.50 F.lt.50 M.ge.50 F.ge.50
# 1: hypertension 521 104 126 143 148
# 2: diabetes 482 109 120 125 128
# 3: estrogen 492 99 126 119 148
I know you asked for dlpyr/tidy (and now that I've provided an MRE dataset, I'm sure you'll get one...). IMO data.table is a better option: the syntax is no worse and it's almost always faster, usually by factors of 10-100.
I have a table (data retrieved from a SQL database) in the form of:
╔═══════╦══════════════╗
║ Model ║ Manufacturer ║
╠═══════╬══════════════╣
║ A ║ 1 ║
║----------------------║
║ A ║ 2 ║
║----------------------║
║ A ║ 3 ║
║----------------------║
║ B ║ 4 ║
║----------------------║
║ B ║ 5 ║
║----------------------║
║ C ║ 6 ║
║----------------------║
║ D ║ 7 ║
║----------------------║
║ D ║ 8 ║
║----------------------║
║ D ║ 9 ║
║----------------------║
║ D ║ 10 ║
╚═══════╩══════════════╝
Each row is it's own <tr> when I bind the data to a <asp:datagrid>. What I need, though, is:
╔═══════╦══════════════╗
║ Model ║ Manufacturer ║
╠═══════╬══════════════╣
║ A ║ 1 ║
║ ║ 2 ║
║ ║ 3 ║
║----------------------║
║ B ║ 4 ║
║ ║ 5 ║
║----------------------║
║ C ║ 6 ║
║----------------------║
║ D ║ 7 ║
║ ║ 8 ║
║ ║ 9 ║
║ ║ 10 ║
╚═══════╩══════════════╝
I have spent a lot of time searching and have tried a number of different things, most of it using LINQ. But my knowledge and understand of LINQ is very little. I think (but I could be completely wrong), the closest I've come is from this question/answer: Linq query to combine cell results?. My slightly modified version of that is:
Dim results = table.AsEnumerable().GroupBy(Function(x) x.Field(Of String)("Model")).[Select](Function(grouping) New With {
.Key = grouping.Key,
.CombinedModel = grouping.Aggregate(Function(a, b) a + " " + b)
}).[Select](Function(x) New ModelRecord() With {
.Manufacturer = x.Key.Manufacturer,
.Model = x.CombinedModel
})
But because of my lack of LINQ knowledge, I don't understand the "defining a concrete type to represent Row" (which creates a problem with ModelRecord() in my code).
I'm pretty much completely lost at this point. Am I over-complicating things? Going at it completely wrong? Any help here would be immensely appreciated.
This solution that I mentioned in my comments to your question works. Thanks for giving it a try.
Instead of using a DataGrid or GridView, I believe this is better suited to a Repeater, where you have better control of the formatting of the data. Then you could load the data into a Dictionary of Models with a List of Manufacturers (Dictionary<Model, List<Manufacturer>>). After loading the object up, you determine how to display it in the Repeater.