I'll describe my data:
First column are corine_values, going from 1 to 50.
Second column are bird_names, there are 70 different bird_names, each corine_value has several bird_names.
Third column contains the sex of the bird_name.
Fourth column contains a V1-value (measurement) that belongs to the category described by the first three columns.
I want to create a table where the the row names are the bird_names. First all the females in alphabetical order, followed by the males in alphabetical order. The column names should be the corine_values, from small to big. The data in the table should be the corresponding V1-values.
I've been trying some things, but to be honest I'm just starting with R and I don't really have a clue how to do it. I can sort the data, but not on multiple levels (like alphabetical and sex combined). I'm exporting everything to Excel now and doing it manually, which is very time-consuming.
Related
I have several data files and I showed their intersects with an upset plot. I now want to know what are the unique values in each dataset? For example, as in this picture, how can I extract the names/values of 232 sets of Thriller category?
I first used union to combine all my data into a single dataframe and then I used setdiff in setdiff(data1,all) to characterise the unique values, but nothing has shown up, while in my real upset plot, I have 10 values unique to my data1.
Thanks.
I have a messy dataset with multiple entries in some cells. The numbers in paranthesis refer to the specific columns "(1)", "(2)", and "(3)". In this example
multiple entries in cell 30 refers to column (2) and 20 refers to column (1). No information for column (3).
I would like to split up/extract the values in the cells and create 3 additional columns.
Several hundred cells are affected in several columns.
Dataset
In the end I would like to have 3 new columns for each column affected. Any idea how I do that? I'm still a rookie so help is much appreciated!
I want to create multiple columns based on subjects and Marks. data in column Name and Age will remain same for different subjects as shown
My first table is input data and the second one is desired output.
Basically I'm trying to sort through a large dataset where the first 3 numbers correspond to different texts. Before I can filter through, I'm trying to assign the different strings values.
Crops 101
Fishing 102
Livestock 103
Movies, TV, & Stage 201
In the larger dataset, there are hundreds of numbers such as 1018347 where the first three numbers correspond to crops and include the number of times that value appeared. The numbers after specify what type of crops, but for the purpose of my work I need to sort through the entire thing by the first three numbers and sum the amount for each time occurred. I'm fairly new to R and wasn't able to find a sufficient answer, so any help would be appreciated.
Not sure if i am getting your question correctly but it seems you are looking for a way to first create a new variable based on the first three numbers in the variable and afterwards summarize the results in a sum per category
What could work is
data %>%
mutate(first_part = substr(variable,1,3)) %>%
group_by(first_part) %>%
summarize(occurrences= n())
Code above counts the amount of times the "first_part"(which is the first 3 numbers) is occurring. This can also be reproduces for the second part or both together.
I have a dataframe that has 23 columns of various parameters defining a patient which I extracted using dplyr from a larger dataframe after pivoting it such that each of the parameters forms the columns of the new dataframe.
Now I am facing an issue. I am getting a lot of rows for the same patient. For each parameter, one of the rows shows the required value and the rest is denoted as NA. So if the same patient is repeated, say 10 times, in every parameter column there is one row with the actual value and the rest is NA.
How do I remove these NAs and gather the information that is scattered in this manner?
I want the 1 and 2 to be on the same row. All the rows seen in this image of dataframe are of the same person.