I have data from excel which should be a group data(the one I higlight in the picture), the problem is, when i import it to R, it won't consider those data as grouped, how can i fix the problem?
As I can see from the image you provide. They are 3 separated columns, so when you import the excel file to Rstudio. It will treat them as 3 different columns. However, if you want to unite 3 columns into 1. There are also solutions for that.
Related
It is known that Excel sheets can display a maximum of 1 million rows. Is there any row limit for csv data, i.e. does Excel allow more than 1 million rows in csv format?
One more question: About this 1 million limitation; Can Excel hold more than 1 million data rows, even though it only displays a maximum of 1 million data rows?
CSV files have no limit of rows you can add to them. Excel won't hold more that the 1 million lines of data if you import a CSV file having more lines.
Excel will actually ask you whether you want to proceed when importing more than 1 million data rows. It suggests to import the remaining data by using the text import wizard again - you will need to set the appropriate line offset.
In my memory, excel (versions >= 2007) limits the power 2 of 20: 1.048.576 lines.
Csv is over to this boundary, like ordinary text file. So you will be care of the transfer between two formats.
Using the Excel Text import wizard to import it if it is a text file, like a CSV file, is another option and can be done based on which row number to which row numbers you specify. See: This link
I have several excel files (*.xlsx) and I want to import them into R, but each file has 6 to 7 tables in a single sheet, separated by chunks of text, like the picture.
I know how to import several excel files using a loop, but my issue is I cannot figure out how select each of the tables distributed along each sheet, avoiding the rows with text, and bind them. Also, each table from each excel file starts in a different cell, so I cannot just define a coordinate (a specific cell) to import the tables. Every excel file is different in amount of rows. I'll appreciate any help.
For instance, the above picture is about Maryland (an US State), and I want to transform that into what is presented in the following picture:
This is a toy file to anyone able to help me: LINK
Thanks!
Based on the image of the data you showed, it seems that all rows can be removed where the second column has an NA? In that case subsetting in base R is pretty straightforward:
test <- test[!is.na(test[,2]),]
Quick explanation:
test[ ,2] --> evaluate all rows in column 2
is.na(test[ ,2]) --> return TRUE if cell is NA
!is.na(test[ ,2]) --> return FALSE if cell is NA
test[!is.na(test[,2]),] --> all rows of test dataframe where cell in col 2 is not NA
Again, based on the data you showed this should work. But hard to work out w/o true sample date.
I'm a new who is exploring bioinformatics via R. Right now I've encounter a trouble, where I imported my data in excel into R through changing it into csv format and using read.csv command, as you see in the pic there are 37 variables (column) where first column is supposed to be considered as fixed factor. And I would like to match it with another matirx which has only 36 variables in the downstream processing, what should I do to reduce variable numbers by fixing first column?
Many thanks in advance.
sure, I added str() properties of my data here.
If I am not mistaken, what you are looking for is setting the "Gene" column as metadata, indicating what gene those values in every row correspond to. You can try then to delete the word "Gene" in the Excel file because when you import it with the read.csv() function, the argument row.names = TRUE is set as default when "there is a header and the first row contains one fewer field than the number of columns".
You can find more information about this function using ?read.csv
I am working with a few Excel tables (.xlsx) and I would like to import my data into Rstudio for some following work.
mydata <- read.xlsx(file.choose(), 1)
This code above worked well until there came a single row which had 4 columns, 3 of them filled with NAs.
Then RStudio kinda created 3 more columns filled with NAs when I simply tried to display which has been read... So basically there are 7 columns, 3 of them is unnecessary.
I'd like to have only those 4 columns, with NAs in it as simple it was in Excel.
I tried to look for the arguments I can add to read.xlsx function and neither of them solved this problem.
Thank you for your help in advance!
Ps. a screenshot:
https://imgur.com/Yt5Az3M
When I try to import an excel worksheet with readxlsx function it can be observed that in the preview there are more than 100 columns inserted into the data frame. But when I look inside the data frame, only the first 100 columns are visible. Thus, adding some columns and then using writexlsx is omitting those columns. Is there any way to avoid this situation?
Regards,
RafaĆ