Transform multiple entries in cell into new columns in R - r

I have a messy dataset with multiple entries in some cells. The numbers in paranthesis refer to the specific columns "(1)", "(2)", and "(3)". In this example
multiple entries in cell 30 refers to column (2) and 20 refers to column (1). No information for column (3).
I would like to split up/extract the values in the cells and create 3 additional columns.
Several hundred cells are affected in several columns.
Dataset
In the end I would like to have 3 new columns for each column affected. Any idea how I do that? I'm still a rookie so help is much appreciated!

Related

How do I gather data that is spread across in various rows to a single row?

I have a dataframe that has 23 columns of various parameters defining a patient which I extracted using dplyr from a larger dataframe after pivoting it such that each of the parameters forms the columns of the new dataframe.
Now I am facing an issue. I am getting a lot of rows for the same patient. For each parameter, one of the rows shows the required value and the rest is denoted as NA. So if the same patient is repeated, say 10 times, in every parameter column there is one row with the actual value and the rest is NA.
How do I remove these NAs and gather the information that is scattered in this manner?
I want the 1 and 2 to be on the same row. All the rows seen in this image of dataframe are of the same person.

R: stacking up values from rows of a data frame

I started programming in R yesterday (literally), and I am having the following issue:
-I have a data frame containing R rows, and each row contains N values.
Rows are identified by the first and second field, while the other N-2 are just numerical values or NA.
-Some rows have identical first field and identical second field, something like:
row 1: a,b, third_field, .. ,last_field
row 2: a,b, third_field, .. ,last_field
the rule is that usually the first line will have its fields containing some numbers and some NA, while the second row will contain NA and numbers as well, but differently distributed.
What I am trying to do is to merge the two rows (or records) according to these two rules:
1) if both rows have a NA on a given field, I keep NA
2) if one of the two has a number, I use that value; if both of the rows contain the same value, I keep it also.
How do you do this without looping on each field of each row? (1M rows, tenths of fields, it will finish maybe tomorrow).
I do not know how to better explain my problem. I am sorry for the lengthy explaination, thanks a lot.
EDIT: it is better if I add an example. The following two lines
a,b,NA,NA,NA,1,2 ,NA
a,b,NA,3 ,NA,1,NA,NA
should become
a,b,NA,3 ,NA,1,2 ,NA

Problems creating table from data in long format

I'll describe my data:
First column are corine_values, going from 1 to 50.
Second column are bird_names, there are 70 different bird_names, each corine_value has several bird_names.
Third column contains the sex of the bird_name.
Fourth column contains a V1-value (measurement) that belongs to the category described by the first three columns.
I want to create a table where the the row names are the bird_names. First all the females in alphabetical order, followed by the males in alphabetical order. The column names should be the corine_values, from small to big. The data in the table should be the corresponding V1-values.
I've been trying some things, but to be honest I'm just starting with R and I don't really have a clue how to do it. I can sort the data, but not on multiple levels (like alphabetical and sex combined). I'm exporting everything to Excel now and doing it manually, which is very time-consuming.

Displaying datatable column values in dollars in Shiny r

I have a datatable where one of the columns should be expressed in dollars and some as percentages. I've been looking around and I'm still not sure how to do it - seems like it would be easy?
The trickier part is I have another data table where only certain entries need to be expressed as dollars (i.e. not whole rows or whole columns) - is there a way to handle this?
Imagine your datatable (myData) is 2 columns by 10 rows.
You want the second row to be in dollars:
myData[,2]<-sapply(myData[,2],function(x) paste0("$",x))
Or, you want rows 6 to 10 in the first column to be percentages:
myData[6:10,1]<-sapply(myData[6:10,1],function(x) paste0(x,"%"))
Or, you want rows 1 to 5 in the second column to be in dollars, you can do:
myData[1:5,2]<-sapply(myData[1:5,2],function(x) paste0("$",x))

extract columns that don't have a header or name in R

I need to extract the columns from a dataset without header names.
I have a ~10000 x 3 data set and I need to plot the first column against the second two.
I know how to do it when the columns have names ~ plot(data$V1, data$V2) but in this case they do not. How do I access each column individually when they do not have names?
Thanks
Why not give them sensible names?
names(data)=c("This","That","Other")
plot(data$This,data$That)
That's a better solution than using the column number, since names are meaningful and if your data changes to have a different number of columns your code may break in several places. Give your data the correct names and as long as you always refer to data$This then your code will work.
I usually select columns by their position in the matrix/data frame.
e.g.
dataset[,4] to select the 4th column.
The 1st number in brackets refers to rows, the second to columns. Here, I didn't use a "1st number" so all rows of column 4 are selected, i.e., the whole column.
This is easy to remember since it stems from matrix calculations. E.g., a 4x3 dimensional matrix has 4 rows and 3 columns. Thus when I want to select the 1st row of the third column, I could do something like matrix[1,3]

Resources