When I try to import an excel worksheet with readxlsx function it can be observed that in the preview there are more than 100 columns inserted into the data frame. But when I look inside the data frame, only the first 100 columns are visible. Thus, adding some columns and then using writexlsx is omitting those columns. Is there any way to avoid this situation?
Regards,
RafaĆ
Related
I have several excel files (*.xlsx) and I want to import them into R, but each file has 6 to 7 tables in a single sheet, separated by chunks of text, like the picture.
I know how to import several excel files using a loop, but my issue is I cannot figure out how select each of the tables distributed along each sheet, avoiding the rows with text, and bind them. Also, each table from each excel file starts in a different cell, so I cannot just define a coordinate (a specific cell) to import the tables. Every excel file is different in amount of rows. I'll appreciate any help.
For instance, the above picture is about Maryland (an US State), and I want to transform that into what is presented in the following picture:
This is a toy file to anyone able to help me: LINK
Thanks!
Based on the image of the data you showed, it seems that all rows can be removed where the second column has an NA? In that case subsetting in base R is pretty straightforward:
test <- test[!is.na(test[,2]),]
Quick explanation:
test[ ,2] --> evaluate all rows in column 2
is.na(test[ ,2]) --> return TRUE if cell is NA
!is.na(test[ ,2]) --> return FALSE if cell is NA
test[!is.na(test[,2]),] --> all rows of test dataframe where cell in col 2 is not NA
Again, based on the data you showed this should work. But hard to work out w/o true sample date.
I have data from excel which should be a group data(the one I higlight in the picture), the problem is, when i import it to R, it won't consider those data as grouped, how can i fix the problem?
As I can see from the image you provide. They are 3 separated columns, so when you import the excel file to Rstudio. It will treat them as 3 different columns. However, if you want to unite 3 columns into 1. There are also solutions for that.
I'm a new who is exploring bioinformatics via R. Right now I've encounter a trouble, where I imported my data in excel into R through changing it into csv format and using read.csv command, as you see in the pic there are 37 variables (column) where first column is supposed to be considered as fixed factor. And I would like to match it with another matirx which has only 36 variables in the downstream processing, what should I do to reduce variable numbers by fixing first column?
Many thanks in advance.
sure, I added str() properties of my data here.
If I am not mistaken, what you are looking for is setting the "Gene" column as metadata, indicating what gene those values in every row correspond to. You can try then to delete the word "Gene" in the Excel file because when you import it with the read.csv() function, the argument row.names = TRUE is set as default when "there is a header and the first row contains one fewer field than the number of columns".
You can find more information about this function using ?read.csv
I am new to R and coding in general, so please bear with me.
I have a spreadsheet that has 7 sheets, 6 of these sheets are formatted in the same way and I am skipping the one that is not formatted the same way.
The code I have is thus:
lst <- lapply(2:7,
function(i) read_excel("CONFIDENTIAL Ratio 062018.xlsx", sheet = i)
)
This code was taken from this post: How to import multiple xlsx sheets in R
So far so good, the formula works and I have a large list with 6 sub lists that appears to represent all of my data.
It is at this point that I get stuck, being so new I do not understand lists yet, and really need the lists to be merged into one single data frame that looks and feels like the source data (so columns and rows).
I cannot work out how to get from a list to a single data frame, I've tried using R Bind and other suggestions from here, but all seem to either fail or only partially work and I end up with a data frame that looks like a list etc.
If each sheets has the same number of columns (ncol) and same names (colnames) then this will work. It needs the dplyr pacakge.
require(dplyr)
my_dataframe <- bind_rows(my_list)
I have a huge data frame df in R. Now,if I invoke View(df) then Rstudio does not respond since it's too big. So, I am wondering, if there is any way to view, say first 500 lines, of a data frame as spreadsheet.
(I know it's possible to view using head but I want to see it as spreadsheet since it has too many columns and using head to see too many columns is not really user friendly)
If you want to see first 100 lines of the data frame df as spreadsheet, use
View(head(df,100))
You can look at the dataframe as a matrix such as df[rowfrom:rowto, columnfrom:columnto], for example in your case:
df[1:500,]