I am querying data in a table that has a field value that I want across the top of my report, and then multiple values I want to display down. Perhaps a cross-tab will work, but if so, how? (I'm open to suggestions to using something besides a cross-tab or formatting the original data differently.)
My data looks something like this:
Record Type Value1 Value2 PercentV1V2 Value3 PercentV1V3
TypeA 10 100 10 50 20
TypeB 20 40 50 200 10
TypeC 50 100 50 50 100
And, I would like my output to look like this (this formatting is not so negotiable)
TypeA TypeB TypeC
Set1Info
Value1 10 20 50
Value2 100 40 100
PercentV1V2 10 50 50
Set2Info
Value3 50 200 50
PercentV1V3 20 10 100
I've been messing with a cross-tab and I can get the first value set. But there doesn't appear to be a way to add a new row to the cross-tab.
I think I could format the data differently and use a cross-tab, but I'd prefer doing the calculations in my procedure, not in Crystal, if possible. They won't be simple totals, in any case.
So, if I formatted my data like so:
Record Type ValueType Value
TypeA Value1 10
TypeA Value2 100
TypeA PercentV1V2 10 <-- this would take another pass to calculate
TypeB Value1 20
etc, then I think the cross-tab would work better. But even then, would I be able to make formatting changes between Set1Info and Set2Info?
Or, is there a way with the original data to get the Types across the top and the values down, but without using a cross-tab? I can hard-code the headers, but that means every number displayed would have to be a formula, right? (I'd need to match up the correct number with each RecordType and ValueType.)
It seems like I'm overlooking something obvious, an easy way to approach this. But sometimes with Crystal, there is no easy way.
To the extent I understand your problem and requirement... looks like what you have done is correct.
Only one addition that comes in to my mind to solve the issue is to use Embeeded Summay.
Use the embeeded summary and add a blank row where require and add your Set2Info as header.
Solution 2:
One option you can try is to create a formula #Group and manually saperate the rows and create a group.
if RecordType = "Value1" or RecordType = "Value2" or RecordType = " PercentV1V2"
then "Set1info"
go to the cross tab expert and place the formula first in rows part so that it will group as required and show you a header.
Related
I posted yesterday another question but I feel I need to clarify it.
Let's say I have this code
md.NAME <- (subset(MyData, HotelName=="ALAMEDA"))
md.NAME.fc <- (subset(md.ALAMEDA, TIPO=="FORECAST"))
md.NAME.fc.bar <- (subset(md.ALAMEDA.fc, Market.Segment=="BAR"))
What I want is that NAME changes according to a variable set before those 3 lines are run,
So NAME is just dynamic in the sense that before these 3 lines I could say, ok, NAME now is equal to JOHN, but then, I could say that NAME is now equal to PATRIC.
So after running those 3 lines, twice (once for John and once for Patric) somehow in the environment I will get something like this:
6 dataframes, 3 for JOHN and 3 for PATRIC
DATAFRAME 1 WILL BE md.JOHN
DATAFRAME 2 WILL BE md.JOHN.fc
DATAFRAME 3 WILL BE md.JOHN.fc.bar
DATAFRAME 1 WILL BE md.PATRIC
DATAFRAME 2 WILL BE md.PATRIC.fc
DATAFRAME 3 WILL BE md.PATRIC.fc.bar
All the answers I had so far would help me only if "md" and "fc" or "fc.bar" are always the same. But I will have several variables like this, which will change a lot as far as the naming goes. So, it is the center part (NAME) the only one that should change.
I could even have something like:
md.test$NAME <- ...
I'm fairly new to R. I'm working with a data set that is incredibly redundant with a lot of columns (~400). There are several duplicate column names, however the data is not duplicate, so I need to sum the columns when collapsing them.
The columns all have a similar name that allows easy identification, so I'm hoping I can use that to my advantage.
I attempted to perform the following:
ColNames <- unique(colnames(df))
CombinedDf <- data.frame(sapply(ColNames, function(i)rowSums(Test[,ColNames==i, drop=FALSE])))
This works if I sum over the range of columns that only contain integers, but the issue is that other columns have strings and such in them, so rowSums throws a fit.
Assuming that the identifier is "XXX", how can I aggregate all the columns that are of the same name leaving the other columns as is?
Thank you for your time.
Edit: Sample data has been asked for, I cannot give the exact data as it is sensitive, but I will give an example:
Name COL1XXX COL2XXX COL1XXX COL3XXX COL2XXX Type
Henry 5 15 25 31 1 Orange
Tom 8 16 12 4 3 Green
Should return
Name COL1XXX COL2XXX COL3XXX Type
Henry 30 16 31 Orange
Tom 20 19 4 Green
I'm not really sure, but you may try transposing the data and then aggregating by unique names.
t_df=as.data.frame(t(df))
new_df=aggregate(t_df, by=list(rownames(t_df)),sum)
Again, without sample data I'm unsure if it'll work, but based on what you said, that might work.
I have the following data frame:
group_id date_show date_med
1 1976-02-07 1971-04-14
1 1976-02-09 1976-12-11
1 2011-03-02 1970-03-22
2 1993-08-04 1997-06-13
2 2008-07-25 2006-09-01
2 2009-06-18 2005-11-12
3 2009-06-18 1999-11-03
I want to subset my data frame in such a way that the new data frame only shows the rows in which the values of date_show are further than 10 days apart but this condition should only be applied per group. I.e. if the values in the date_show column are less than 10 days apart but the group_ids are different, I need to keep both entries. What I want my result to look like based on the above table is:
group_id date_show date_med
1 1976-02-07 1971-04-14
1 2011-03-02 1970-03-22
2 1993-08-04 1997-06-13
2 2008-07-25 2006-09-01
2 2009-06-18 2005-11-12
3 2009-06-18 1999-11-03
Which row gets deleted isn't important because the reason why I'm subsetting in the first place is to calculate the number of rows I am left with after applying this criteria.
I've tried playing around with the diff function but I'm not sure how to go about it in the simplest possible way because this problem is already within another sapply function so I'm trying to avoid any kind of additional loop (in this case by group_id).
The df I'm working with has around 100 000 rows. Ideally, I would like to do this with base R because I have no rights to install any additional packages on the machine I'm working on but if this is not possible (or if solving this with an additional package would be significantly better), I can try and ask my admin to install it.
Any tips would be appreciated!
In SparkR I have a DataFrame data. It contains time, game and id.
head(data)
then gives ID = 1 4 1 1 215 985 ..., game = 1 5 1 10 and time 2012-2-1, 2013-9-9, ...
Now game contains a gametype which is numbers from 1 to 10.
For a given gametype I want to find the minimum time, meaning the first time this game has been played. For gametype 1 I do this
data1 <- filter(data, data$game == 1)
This new data contains all data for gametype 1. To find the minimum time I do this
g <- groupBy(data1, game$time)
first(arrange(g, desc(g$time)))
but this can't run in sparkR. It says "object of type S4 is not subsettable".
Game 1 has been played 2012-01-02, 2013-05-04, 2011-01-04,... I would like to find the minimum-time.
If all you want is a minimum time sorting a whole data set doesn't make sense. You can simply use min:
agg(df, min(df$time))
or for each type of game:
groupBy(df, df$game) %>% agg(min(df$time))
By typing
arrange(game, game$time)
I get all of the time sorted. By taking first function I get the first entry. If I want the last entry I simply type this
first(arrange(game, desc(game$time)))
Just to clarify because this is something I keep running into: the error you were getting is probably because you also imported dplyr into your environment. If you would have used SparkR::first(SparkR::arrange(g, SparkR::desc(g$time))) things would probably have been fine (although obviously the query could've been more efficient).
I'm new to R and I am having some trouble iterating through the unique element of a vector. I have a dataframe "School" with 700 different teachers. Each teacher has around 40 students.
I want to be able to loop through each teacher, create a graphs for the mean score of his/her students' over time, save the graphs in a folder and automatically email that folder to that teacher.
I'm just getting started and am having trouble setting up the for-loop. In Stata, I know how to loop through each unique element in a list, but am having trouble doing that in R. Any help would be appreciated.
School$Teacher School$Student School$ScoreNovember School$ScoreDec School$TeacherEmail
A 1 35 45 A#school.org
A 2 43 65 A#school.org
B 1 66 54 B#school.org
A 3 97 99 A#school.org
C 1 23 45 C#school.org
Your question seems a bit vague and it looks like you want us to write your whole project. Could you share what you have done so far and where exactly you are struggling?
see ?subset
School=data.frame(Teacher=c("A","B"), ScoreNovember=10:11, ScoreDec=13:14)
for (teacher in unique(School$Teacher)) {
teacher_df=subset(School, Teacher==teacher)
MeanScoreNovember=mean(teacher_df$ScoreNovember)
MeanScoreDec =mean(teacher_df$ScoreDec)
# do your plot
# send your email
}
I think you have 3 questions, which will need separate questions, how do I:
Create graphs
Automatically email output
Compute a subset mean based on group
For the 3rd one, I like using the plyr package, other people will recommend data.table or dplyrpackages. You can also use aggregate from base. To get a teacher's mean:
library(plyr)
ddply(School,.(Teacher),summarise,Nov_m=mean(ScoreNovember))
If you want per student per teacher, etc. just add between the columns, like:
library(plyr)
ddply(School,.(Teacher,Student),summarise,Nov_m=mean(ScoreNovember))
You could do that for each score column (and then chart it) if your data was long rather than wide you could also add the date ('November', 'Dec') as a group in the brackets, or:
library(plyr)
ddply(School,.(Teacher,Student),summarise,Nov_m=mean(ScoreNovember),Dec_m=mean(ScoreDec))
See if that helps with the 3rd, but look at splitting your questions up too.