I have a messy dataset with multiple entries in some cells. The numbers in paranthesis refer to the specific columns "(1)", "(2)", and "(3)". In this example
multiple entries in cell 30 refers to column (2) and 20 refers to column (1). No information for column (3).
I would like to split up/extract the values in the cells and create 3 additional columns.
Several hundred cells are affected in several columns.
Dataset
In the end I would like to have 3 new columns for each column affected. Any idea how I do that? I'm still a rookie so help is much appreciated!
I have dozens of very heavy Excel files that I need to import into R (then rebind). Each file has 2 sheets, where the second sheet (name: "Results") consists of 100K rows at least and has about 350 columns.
I would like to read a subset of the sheet "Results" from each file by columns, but most importantly, by specific rows. Each "ID" in the data, has a main row and then multiple rows below which contain data in specific columns. I would like to read the main row only (this leaves each file with 50-400 rows (depending on the file) and 150 variables). The first column that numbers main rows does not have a header.
This is what the data looks like (simplified):
I would like to import only the rows whose first column isn't empty but numbered (i.e., 1., 13., 34., 211.) and particular columns, in this example columns 2,3,5 (i.e., name, ID, status). The desired output would be:
Is there a simple way to do this?
Let's say a is our excel file, as data frame.
library(readxl)
a <- as.data.frame(read_excel("Pattern/File.xlsx",sheet = "Results"))
For instance, we want to select columns 1 to 3, so use
subset(a[,1:3],is.na(a[1])==FALSE)
By this function, you are subsetting the input data frame with values different than NA in first column.
Output:
...1 name ID
1 1 Dan us1d
4 13 Nev sa2e
6 34 Sam il5a
Note first column name (" ...1 "). This is autogenerated by read_excel() function, but should not be a problem.
I want to know how can we combine a specific column of one file with another column of another file in R?
I want to subtract 50 from the maximum of each column. I tried this but it didn't work:
a <- 50-max(datafile1$X2018.03.06,datafile1$X2017.07.13)
I am struggling with merging a number of excel sheets into one data frame(or tibble). What I have is an excel file containing 21 sheets, each of the sheets has the same 2 columns which are identical in all the 21 sheets. But now I don't know how to tackle this in the most efficient way. Number of rows is the same in every sheet.
After I have done merging the sheets I wanted to select a number of the columns with dplyr::select to only choose specific columns.
I don't have enough experience to handle this myself, how would you tackle it?
Columns are our clients, and each of the observations is their energy demand per 15 min. So all the observations are of the same data type.
I have tried to make a tibble with the following names:
>colnames(test)
[1]"X871691600001976087"
[2]"X871691600001837791" etc
I have a list of column names that I want to extract:
>testnames
[1]"X871685900000003968"
[2]"X871685900000009600" etc
Some of those column names will have to match, but they don't. I get this error:
selecttest <- select(test, one_of(testnames))
"Warning message:
Unknown variables: `871685900000003968`, etc.. (variables from testnames)"
Is this any sufficient information to get a hint here?
When I use R, I try to extract specific rows which have some specific strings in one column.
The data structure as following
ERC1 20679 14959 9770 RAB6-interacting protein 2 isoform
I want to extract the rows which have RAB6 in the last column. That column still has some other words besides RAB6 so I can not use column = "RAB6" to get them. It's just like a search function in excel. Does anyone have any ideas?
Assuming that your data frame is df:
df[grep("^RAB6", df$column),]
If not all values start with RAB6 remove the^.