I'm working with NAICS data for all the counties in the US, there are 435581 rows of data. Each county (county names are in column A and B) in the US has a series of businesses with associated codes which will be in column C. (Column D is a description of the business) Column E is the number of their employees. Each business has been given an individual row so you can imagine each county has tens of rows associated to it. I was wondering if there was a way to rearrange them in a way that each county has only one row, but multiple columns with business codes as their titles and then the number of employees.
I have added pictures so that you can see what I mean.
Before
What I'm looking for
Related
I'm totally new to R and I'm trying to analyze some healthcare data. I have a dataframe containing multiple different types of prices for a given medical procedure, of which there are 6 versions (231-236). The different prices are the Medicare price, chargemaster price, self-pay price, and commercial price. So there are 6x4=24 columns of data containing prices. Each row represents a different hospital. Is there a way to efficiently collect the medians and IQRs for each of the 24 columns and put them into a table?
So far, I'm just using the summary function: summary(cabg$chargemaster_231) but its very tedious to manually copy and paste the output values into a table. Appreciate any help!
I have several data files and I showed their intersects with an upset plot. I now want to know what are the unique values in each dataset? For example, as in this picture, how can I extract the names/values of 232 sets of Thriller category?
I first used union to combine all my data into a single dataframe and then I used setdiff in setdiff(data1,all) to characterise the unique values, but nothing has shown up, while in my real upset plot, I have 10 values unique to my data1.
Thanks.
I have two data frames. One data frame is called Measurements and has 500 rows. The columns are PatientID, Value and M_Date. The other data frame is called Patients and has 80 rows and the columns are PatientID, P_Date.
Each patient ID in Patients is unique. For each row in Patients, I want to look at the set of measurements in Measurements with the same PatientID (there are maybe 6-7 per patient).
From this set of measurements, I want to identify the one with M_Date closest to P_Date. I want to append this value to Patients in a new column. How do I do this? I tried using ddplyr but can't figure out how to access two data frames at once within this function.
you probably want to install the install.packages("survival") and the neardate function within it to solve your problem.
It has a good example in the documentation
I have observed plant leaf number (column 'No. of fully expanded leaves") of different plant (Plant id) on different days (Measuring date). I want to group plants with the same maximum No. of fully expanded leaves (with all columns included) and put them in the same spreadsheet, which means plants with different max leaves will be put into separate files. Here is what my data look like:
And here is what the data of a single plant looks like:
How can I do this in R?
Many thanks,
I'll describe my data:
First column are corine_values, going from 1 to 50.
Second column are bird_names, there are 70 different bird_names, each corine_value has several bird_names.
Third column contains the sex of the bird_name.
Fourth column contains a V1-value (measurement) that belongs to the category described by the first three columns.
I want to create a table where the the row names are the bird_names. First all the females in alphabetical order, followed by the males in alphabetical order. The column names should be the corine_values, from small to big. The data in the table should be the corresponding V1-values.
I've been trying some things, but to be honest I'm just starting with R and I don't really have a clue how to do it. I can sort the data, but not on multiple levels (like alphabetical and sex combined). I'm exporting everything to Excel now and doing it manually, which is very time-consuming.