I have a repeated measures of BMI over 50K people who took BMI measurements on different timepoints. What I want to know is the number of people who took only 1 measurement as well as for only 2 measurements and onwards. How get it in a table with cumulative frequency like PROC FREQ from SAS?
I don't know how to do to get those mutually exclusive numbers in a table format.
Related
I am a cross country runner on a high school team, and I am using my limited knowledge of R and linear algebra to create a ranking index for xc teams.
I get my data from milesplit.com, but I am unsure if I am formatting this data properly. So far I created matrices for each race, with odd columns including runner score and even columns including time, where each team has a team_score and team_time column. I want to analyze growth of teams in a time series, but I have two questions about this:
(1): can I combine all of these "race matrices" into a time series? Can I assign all the data in a race matrix a certain date, then make one big time series including all 25 race matrices I made?
(2): Am I closing myself off to insights by not including name and grade for each runner (as I only record time and score)? If so, how can I write a matrix that contains all this information?
I have the following data table in R, which I need to collapse for streamlined data processing. I can do this manually, but I am looking for the most efficient way possible. The data frame looks like this:
and so on. Each age group has 4 observations, 2 male and 2 female (1 of each type). And region consists of city1, city2, city3, etc. which are all ordered the same as the example above. After all age groups are exhausted, the next cityX begins.
I need to combine gender into the total, summing males and females (within type). I also need to combine all age groups to give a population total (sum all age groups). I need to keep type separate, and then later combine them as an additional column. I want the final rows output to be the region. I need the population totals for each year column. So the final output would be like this:
I know this could be done manually by splitting the data frame repeatedly, but what would be the most efficient way to do this?
So, I have this data set where I have age of chicks (bird chicks) from day 2 to day 10 (2,4,6,8,10) and I have a mass data for each of them on 2,4,6,8 and 10 days. But, not all chicks survive till day 10. So how do I extract a datasheet in R, using the overall datasheet but get only those individuals that have values for each of those days for the mass. And if I also wanted to sort them by Mass and Tarsus. Data set of those that have values for both variables on those days.
New to R and programming, but working on this skill to work through very large data sets. For this exercise, I would like to perform rank correlations between two types of samples for multiple analytes (>50) at various timepoints (i.e. baseline, month 1). I can work through it manually, each analyte and timepoint one by one, but it would awesome if I could get some support to automate to compare analytes in each tissues at the various timepoint. Finally, I would like to generate a data table that lists all the correlations by rank. Any help would be much appreciated. I have included images of the data frame and some of my code below.
Here is an example of the table:
Here is some of my code to get correlation for one analyte (IL-8):
I am analysing data from a Delphi study and I need to create a vector of the frequency of each score (1:10) for each stakeholder group (6 groups, total of 73 participants) for each outcome (48). The data is in the form:
I would like to create a vector similar to:
score 1,2,3,4,5,6,7,8,9
trialists<-c(0,0,0,0,28.6,71.4,0,0,0)
Where it is expressed as a percentage of a stakeholder group (e.g. trialists) that have scored each score for each outcome . I need to excluded a score of 10 as it represents "unable to answer".
This will result in 48 vectors for each of the 6 stakeholder groups.
Is there a elegant way to do this on R rather than just plodding through the data on excel and inputting it manually?