I have observed plant leaf number (column 'No. of fully expanded leaves") of different plant (Plant id) on different days (Measuring date). I want to group plants with the same maximum No. of fully expanded leaves (with all columns included) and put them in the same spreadsheet, which means plants with different max leaves will be put into separate files. Here is what my data look like:
And here is what the data of a single plant looks like:
How can I do this in R?
Many thanks,
Related
I have the following data table in R, which I need to collapse for streamlined data processing. I can do this manually, but I am looking for the most efficient way possible. The data frame looks like this:
and so on. Each age group has 4 observations, 2 male and 2 female (1 of each type). And region consists of city1, city2, city3, etc. which are all ordered the same as the example above. After all age groups are exhausted, the next cityX begins.
I need to combine gender into the total, summing males and females (within type). I also need to combine all age groups to give a population total (sum all age groups). I need to keep type separate, and then later combine them as an additional column. I want the final rows output to be the region. I need the population totals for each year column. So the final output would be like this:
I know this could be done manually by splitting the data frame repeatedly, but what would be the most efficient way to do this?
I have two data frames. One data frame is called Measurements and has 500 rows. The columns are PatientID, Value and M_Date. The other data frame is called Patients and has 80 rows and the columns are PatientID, P_Date.
Each patient ID in Patients is unique. For each row in Patients, I want to look at the set of measurements in Measurements with the same PatientID (there are maybe 6-7 per patient).
From this set of measurements, I want to identify the one with M_Date closest to P_Date. I want to append this value to Patients in a new column. How do I do this? I tried using ddplyr but can't figure out how to access two data frames at once within this function.
you probably want to install the install.packages("survival") and the neardate function within it to solve your problem.
It has a good example in the documentation
I have a complex dataframe (orig_df). Of the 25 columns, 5 are descriptions and characteristics that I wish to use as grouping criteria. The remainder are time series. There are tens of thousands of rows.
I noted in initial analysis and numerical summary that there are significant issues with outlier observations within some of the specific grouping criteria. I used "group by" and looking at the quintile results within those groups. I would like to eliminate the low and high (individual observation) outliers relative to the (group-by based quintile) to improve the decision tree and clustering analytics. I also want to keep the outliers to analyze separately for the root cause.
How do I manipulate the dataframe such that the individual observations are compared to the group-based quintile results and the parse is saved (orig_df becomes ideal_df and outlier_df)?
After identifying the outliers using the link Nikos Tavoularis share above, you can use ifelse to create a new variable and identify which records are outliers and the ones that are not. This way you can keep the data there, but you can use this new variable to sort them out whenever you want
I have a column with a few dozen grades that have been assigned values Good, Average or Poor. I have a different column with employment rates. I want the maximum employment rate associated with Good, Average and Poor. I can get it to pull the value for each one in three different commands using the code below, but I need it written as a single command similar to this:
max(unHomework$Employment.Rate[unHomework$Job.Satisfaction.Category == 'Poor'])
We can use data.table
library(data.table)
setDT(unHomework)[, .(MaxER =max(Employment.Rate)), by = Job.Satisfaction.Category]
I'll describe my data:
First column are corine_values, going from 1 to 50.
Second column are bird_names, there are 70 different bird_names, each corine_value has several bird_names.
Third column contains the sex of the bird_name.
Fourth column contains a V1-value (measurement) that belongs to the category described by the first three columns.
I want to create a table where the the row names are the bird_names. First all the females in alphabetical order, followed by the males in alphabetical order. The column names should be the corine_values, from small to big. The data in the table should be the corresponding V1-values.
I've been trying some things, but to be honest I'm just starting with R and I don't really have a clue how to do it. I can sort the data, but not on multiple levels (like alphabetical and sex combined). I'm exporting everything to Excel now and doing it manually, which is very time-consuming.