I need your help again :)
I wrote an R script, that generates a heatmap out of a given tab-seperated txt or xls file. At the moment, I delete all columns I don't want to have in the heatmap by hand in the xls file.
Now I want to automatize it, but I don't know how :(
The interesting columns all start the same in all xls files, followed by an individual name:
xls-file 1: L1_tpm_xxxx L2_tpm_xxxx L3_tpm_xxxx
xls-file 2: L1_tpm_xxxx L2_tpm_xxxx L3_tpm_xxxx L4_tpm_xxxx L5_tpm_xxxx
Any ideas how to select those columns?
Thanking you in anticipation, Philipp
You could use (if you have read your data in a data.frame df):
df <- df[,grep("^L[[:digit:]]+_tpm.*",colnames(df))]
or you can explicitly write the columns that you want:
df <- df[,c("L1_tpm_xxxx","L2_tpm_xxxx","L3_tpm_xxxx")]
etc...
The following link is quite useful;-)
If you think the column positions are going to be fixed across excel sheets, the simplest solution here is to just use column indices. For example, if you use read.table to import a tab-delimited text file as a data.frame, and then decide you'd prefer to only keep the first two columns, you might do something like this:
data <- read.table("path_to_file.txt", header=T, sep="\t")
data <- data[,1:2]
Related
I'm extremely new to using R and I keep running into an issue with my current data set. The data set consists of several txt files with 30 rows and 3 columns of numerical data. However, when I try to work with them in r, it automatically makes the first row of data the column heading, so that when I try to combine the files, everything gets messed up as none of them have the same column titles. How do I stop this problem from happening? The code I've used so far is below!
setwd("U:\\filepath")
library(readr)
library(dplyr)
file.list <- list.files(pattern='*.txt')
df.list <- lapply(file.list, read_tsv)
After this point it just says that there are 29 rows and 1 column, which is not what I want! Any help is appreciated!
You say:
After this point it just says that there are 29 rows and 1 column, which is not what I want!
What that is telling you is that you don't have a tab-separated file. There's not a way to tell which delimiter is being assumed, but it's not a tab. You can tell that by paying attention to the number of columns. Since you got only one column, the read_tsv function didn’t find any tabs. And then you have the issue that your colnames are all different. That well could mean that your files do not have a header line. If you wanted to see what was in your files you could do something like:
df.list <- lapply(file.list, function(x) readLines(x)[1])
df.list[[1]]
If there are tabs, then they should reveal themselves by getting expanded into spaces when printed to the console.
Generally it is better to determine what delimiters exist by looking at the file with a text editor (but not MS Word).
Use df_list <- lapply(file.list, read_tsv, col_names = FALSE).
So I have a problem (that may or may not have a simpler solution than what I'm trying to do):
I have a csv:
df <- read.csv('dfPhotos.csv')
This csv includes an id column, each of which looks something like id_860139460671021056
I also have a group of png images matching the id for each row that look like 860139460671021056.png for example (such that for every id, there exists an image.
I want to be able to merge the images to the original csv in a for loop such that the last column in the dataset is the png file matching the identifier.
Here's an example of the ID column and the NA's are where I want the images to be:
Is this possible?
If it's not, is there a simple alternative making them retrievable in R?
Thanks in advance!
Do you want to add a new cokumn by getting the id from tweet.idcolumn? We could remove "id_" part of the identifier and paste ".png" at the end to do that.
df$tweets.image <- paste0(sub("id_", "", df$tweets.tweet_id), ".png")
#OR also
#df$tweets.image <- paste0(sub("id_(.*)", "\\1", df$tweets.tweet_id), ".png")
This seems like a silly question, but I really could not find a solution! I need to read only specific columns from an Excel file. The file have multiple sheets with different number of columns, but the ones I need to read will be there. I can do this for csv files, but not for excel! This is my present code, which reads the first 14 columns (but the columns I need might not always be in the first 14). I can't just read them all as rbind will throw an error citing row mismatch (different number of rows in the sheets).
EDIT: I solved this by omitting the col_types parameter, it worked as sheets with different column numbers only had column headers. Still, this is no way a robust solution, so I hope someone can do a better job than me.
INV <- lapply(sheets, function(X) read_excel("./Inventory.xlsx", sheet = X, col_types = c(rep("text", 14))))
names(INV) <- sheets
INV <- do.call("rbind", INV)
I am trying to do something like this:
INV <- lapply(FILES[grepl("Inventory", FILES)],
function(n) read_csv(file=paste0(n), col_types=cols_only(DIVISION="c",
DEPARTMENT="i",
ITEM_ID="c",
DESCRIPTION="c",
UNIT_QTY="i",
COMP_UNIT_QTY="i",
REGION="c",
LOCATION_TYPE="c",
ZONE="c",
LOCATION_ID="c",
ATS_IND="c",
CONTAINER_ID="c",
STATUS="c",
TROUBLE_CODES="c")))
But, for an Excel file. I tried using read.xlsx from openxlsx and read_excel from readxl, but nneither supported doing this. There must be some other way. Don't worry about column types, I am fine with all as characters.
I would very much appreciate if this can be done using readxl or openxlsx.
I'm brand new to R and am having difficulty with something very basic. I'm importing data from an excel file like this:
data1 <- read.csv(file.choose(), header=TRUE)
When I try to look at the data in the table by column, R doesn't recognize the column headers as objects. This is what it looks like
summary(Square.Feet)
Error in summary(Square.Feet) : object 'Square.Feet' not found
I need to run a regression and I'm having the same problem. Any help would be much appreciated.
Yes it recognizes, you have to tell R to select the dataframe so:
summary(data1$Square.Feet)
Where "data" is the name of your dataframe, and after the dollar goes the name of the variable
Hope it helps
UPDATE
As suggested below, you can use the following:
data1 <- read.csv(file.choose(), header=TRUE)
attach(data1)
This way, by doing "attach", you avoid to write everytime the name of the dataset, so we would go from
summary(data1$Square.Feet)
To this point after attaching the data:
summary(Square.Feet)
However I DO NOT recommend to do it, because if you load other datasets you may mess everything as it's quite common that variables have the same names, among other major problems, see here (Thanks Ben Bolker for your contribution): here , here, here and
here
if you want a summary of all data fields, then
summary(data1)
or you can use the 'with' helper function
with(data1, summary(Square.Feet))
I am working on a large questionnaire - and I produce summary frequency tables for different questions (e.g. df1 and df2).
a<-c(1:5)
b<-c(4,3,2,1,1)
Percent<-c(40,30,20,10,10)
df1<-data.frame(a,b,Percent)
c<-c(1,1,5,2,1)
Percent<-c(10,10,50,20,10)
df2<-data.frame(a,c,Percent)
rm(a,b,c,Percent)
I normally export the dataframes as csv files using the following command:
write.csv(df1 ,file="df2.csv")
However, as my questionnaire has many questions and therefore dataframes, I was wondering if there is a way in R to combine different dataframes (say with a line separating them), and export these to a csv (and then ultimately open them in Excel)? When I open Excel, I therefore will have just one file with all my question dataframes in, one below the other. This one csv file would be so much easier than having individual files which I have to open in turn to view the results.
Many thanks in advance.
If your end goal is an Excel spreadsheet, I'd look into some of the tools available in R for directly writing an xls file. Personally, I use the XLConnect package, but there is also xlsx and also several write.xls functions floating around in various packages.
I happen to like XLConnect because it allows for some handy vectorization in situations just like this:
require(XLConnect)
#Put your data frames in a single list
# I added two more copies for illustration
dfs <- list(df1,df2,df1,df2)
#Create the xls file and a sheet
# Note that XLConnect doesn't seem to do tilde expansion!
wb <- loadWorkbook("/Users/jorane/Desktop/so.xls",create = TRUE)
createSheet(wb,"Survey")
#Starting row for each data frame
# Note the +1 to get a gap between each
n <- length(dfs)
rows <- cumsum(c(1,sapply(dfs[1:(n-1)],nrow) + 1))
#Write the file
writeWorksheet(wb,dfs,"Survey",startRow = rows,startCol = 1,header = FALSE)
#If you don't call saveWorkbook, nothing will happen
saveWorkbook(wb)
I specified header = FALSE since otherwise it will write the column header for each data frame. But adding a single row at the top in the xls file at the end isn't much additional work.
As James commented, you could use
merge(df1, df2, by="a")
but that would combine the data horizontally. If you want to combine them vertically you could use rbind:
rbind(df1, df2, df3,...)
(Note: the column names need to match for rbind to work).