I have a file that has 52560 row but only one column with different header names in the one row, so I need to separate that rows with their own values and columns. So the data frame has 52560 X 1 but I need 52560 X 19 (headings). I tried separate and split functions but that did not work. I am new with R programming.
Picture of data frame:
I think the values are separated by ';' . Hence create a list of 9 column headers. Read this file in R with separation ';' while assigning header to the columns.
Related
I have dozens of very heavy Excel files that I need to import into R (then rebind). Each file has 2 sheets, where the second sheet (name: "Results") consists of 100K rows at least and has about 350 columns.
I would like to read a subset of the sheet "Results" from each file by columns, but most importantly, by specific rows. Each "ID" in the data, has a main row and then multiple rows below which contain data in specific columns. I would like to read the main row only (this leaves each file with 50-400 rows (depending on the file) and 150 variables). The first column that numbers main rows does not have a header.
This is what the data looks like (simplified):
I would like to import only the rows whose first column isn't empty but numbered (i.e., 1., 13., 34., 211.) and particular columns, in this example columns 2,3,5 (i.e., name, ID, status). The desired output would be:
Is there a simple way to do this?
Let's say a is our excel file, as data frame.
library(readxl)
a <- as.data.frame(read_excel("Pattern/File.xlsx",sheet = "Results"))
For instance, we want to select columns 1 to 3, so use
subset(a[,1:3],is.na(a[1])==FALSE)
By this function, you are subsetting the input data frame with values different than NA in first column.
Output:
...1 name ID
1 1 Dan us1d
4 13 Nev sa2e
6 34 Sam il5a
Note first column name (" ...1 "). This is autogenerated by read_excel() function, but should not be a problem.
With a dataframe that has, for example, one column x that has nested or multiple values for some rows, how would i, for those rows that have multiple values for x, append duplicate rows to the dataframe, save that that they correspond to one value within x.
To try to explain better, see "mock dataframe pre-transform", below. Row 1 has values "webui, cli, mobile" for column "module", and what i want is to append three near copies of row 1 to the dataframe, one with module value "webui", one with module value "cli" and one with module value "mobile". I also then want to remove the the original row 1. A similar operation would occur for row 4, such that the final dataframe would have 7 rows (see "mock dataframe post-transform, below).
mock dataframe pre-transform
mock dataframe post-transform
When I use R, I try to extract specific rows which have some specific strings in one column.
The data structure as following
ERC1 20679 14959 9770 RAB6-interacting protein 2 isoform
I want to extract the rows which have RAB6 in the last column. That column still has some other words besides RAB6 so I can not use column = "RAB6" to get them. It's just like a search function in excel. Does anyone have any ideas?
Assuming that your data frame is df:
df[grep("^RAB6", df$column),]
If not all values start with RAB6 remove the^.
I have two excel files A and B. Excel file A has 6 columns with 10,000 rows. Assume that columns are named A-F. Excel file B has one column (A) with 3500 rows. Here is what I want to do-
I want to eliminate rows based on the cell values (ids) in column A (in excel file A) and have a dataframe without them. To further elaborate- I want to check each id in Column A (excelfile A) against all ids in columnA in excel file B. If the id in column A in excel file A matches with any of the listed ids in columnA of excel file B, then, I want to eliminate those rows with matching ids in excel file A.
I was able to do this in excel. I want to do the same with R as a crosscheck. I am new to R and I am learning. Could someone help me with a best way to do this?
I know how to subset rows based on header title and a particular value of a cell. But, in this case, I have data with 10,000 observations, of which, I want to eliminate at least 3500 through matching the ids.
I need to extract the columns from a dataset without header names.
I have a ~10000 x 3 data set and I need to plot the first column against the second two.
I know how to do it when the columns have names ~ plot(data$V1, data$V2) but in this case they do not. How do I access each column individually when they do not have names?
Thanks
Why not give them sensible names?
names(data)=c("This","That","Other")
plot(data$This,data$That)
That's a better solution than using the column number, since names are meaningful and if your data changes to have a different number of columns your code may break in several places. Give your data the correct names and as long as you always refer to data$This then your code will work.
I usually select columns by their position in the matrix/data frame.
e.g.
dataset[,4] to select the 4th column.
The 1st number in brackets refers to rows, the second to columns. Here, I didn't use a "1st number" so all rows of column 4 are selected, i.e., the whole column.
This is easy to remember since it stems from matrix calculations. E.g., a 4x3 dimensional matrix has 4 rows and 3 columns. Thus when I want to select the 1st row of the third column, I could do something like matrix[1,3]