Merging png files to a csv as a column in R - r

So I have a problem (that may or may not have a simpler solution than what I'm trying to do):
I have a csv:
df <- read.csv('dfPhotos.csv')
This csv includes an id column, each of which looks something like id_860139460671021056
I also have a group of png images matching the id for each row that look like 860139460671021056.png for example (such that for every id, there exists an image.
I want to be able to merge the images to the original csv in a for loop such that the last column in the dataset is the png file matching the identifier.
Here's an example of the ID column and the NA's are where I want the images to be:
Is this possible?
If it's not, is there a simple alternative making them retrievable in R?
Thanks in advance!

Do you want to add a new cokumn by getting the id from tweet.idcolumn? We could remove "id_" part of the identifier and paste ".png" at the end to do that.
df$tweets.image <- paste0(sub("id_", "", df$tweets.tweet_id), ".png")
#OR also
#df$tweets.image <- paste0(sub("id_(.*)", "\\1", df$tweets.tweet_id), ".png")

Related

Rename files and save a log of old and new filenames using R

I am trying to remove bias from a microscopy analysis, so I want to make it so the experimenter doesn't know what the conditions are for the image they are looking at.
To do this I need to rename every file in a directory so they can't be identified, but I also need to be able to know what the original filename was subsequently.
I made a folder with three files in it to try this out. I got the file list and made a vector for the new names, and combined into a data frame .
setwd("~/Desktop/folder1")
filename_list<-list.files("~/Desktop/folder1")
new_filenames <- c("anon1", "anon2", "anon3")
require(reshape2)
df1 <- melt(data.frame(filename_list,new_filenames))
View(df1)
I've also been able to change names using scripts from a previous question
and r bloggers using sapply and file.rename. I got a little stuck with using wildcards in this to select the whole filename (minus extension) but i'm sure it's possible;
sapply(filename_list,FUN=function(eachPath){file.rename(from=eachPath,to=sub(pattern="image_",replacement="anon",eachPath))})
How I can get the new_filenames vector and apply it to file.rename so it corresponds to the original_filenames vector in the df1 data frame,
or is there a better way to do this? Thanks.

R: How to add multiple variables at the end of a data frame from the file name

Apologies if this is a trivial question. I saw others like it such as: How can I turn a part of the filename into a variable when reading multiple text files into R? , but I still seem to be having some trouble...
I have been given 50000 .txt files. Each file contains a single observation (a single row of data) with exactly 12 variables (number of columns). The name of each .txt file is fairly regular. Specifically, each .txt file has a code at the end indicating the type of observation across three dimensions. An example of this code is 'VL-VL-NE' or 'VL-M-N' or 'H-H-L' (not including the apostrophes). Therefore, an example of a file name could be 'I-love-using-R-20_01_2016-VL-VL-NE.txt'.
My problem is that I want to include this code at the end of the .txt file in the actual vector itself when I import into R, i.e., I want to add three more variables (columns) at the end of the table corresponding to the three parts of code at the end of the file name.
Any help would be greatly appreciated.
Because you have exactly the same number of columns in each file, why don't you import them into R using a loop that looks for all .txt files in a particularly directory?
df <- c()
for (x in list.files(pattern="*.txt")) {
u<-read.csv(x, skip=6)
u$Label = factor(x) #A column that is the filename
df <- rbind(df,u)
}
You'll note that the file name itself becomes a column. Once everything is into R, it should be fairly easy to use a regex function to extract the exact elements you need from the file name column (df$Label).

Saving a txt file as a delimited csv file in R

I have the following code to read a file and save it as a csv file, I remove the first 7 lines in the text file and then the 3rd column as well, since I just require the first two columns.
current_file <- paste("Experiment 1 ",i,".cor",sep="")
curfile <- list.files(pattern = current_file)
curfile_data <- read.table(curfile, header=F,skip=7,sep=",")
curfile_data <- curfile_data[-grep('V3',colnames(curfile_data))]
write.csv(curfile_data,curfile)
new_file <- paste("Dev_C",i,".csv",sep="")
new_file
file.copy(curfile, new_file)
The curfile thus hold two column variables V1 and V2 along with the observation number column in the beginning.
Now when I use file.copy to copy the contents of the curfile into a .csv file and then open the new .csv file in Excel, all the data seems to be concatenated and appear in a single column, is there a way to show each of the individual columns separately? Thanks in advance for your suggestions.
The data in the .txt file looks like this,
"","V1","V2","V3"
"1",-0.02868862,5.442283e-11,76.3
"2",-0.03359281,7.669754e-12,76.35
"3",-0.03801883,-1.497323e-10,76.4
"4",-0.04320051,-6.557672e-11,76.45
"5",-0.04801207,-2.557059e-10,76.5
"6",-0.05325544,-9.986231e-11,76.55
You need to use Text to columns feature in Excel, selecting comma as a delimiter. The position of this point in menu depends on version of Excel you are using.

Importing Multiple Text Files as Individual data.frames in R - 2.15.2 on Mac - 10.8.4

I have searched through this forum for most of the day trying to find the solution - I could not find so I am posting. If the answer is already out there please point me in the right direction.
What I have -
A directory with 40 texts files called the following
test_63x_disc_z00.txt
*01.txt
*02.txt
...
*39.txt
In each of these files there are 10 columns of data with no header and a varying number of rows.
What I want -
I want to have an individual data.frame in R for each text file with names:
file00
file01
...
file39.
I then want to do add a header column to each of these data.frames.
I then want to be able to manipulate the data at ease (this last part I can sort out once I have input a the data)
This is what I have accomplished (don't laugh now) -
I can input a single text file as a data frame and add a header, like so :
d<-read.delim("test_63x_disc_z00.txt", header = F)
colnames(d)<-c("cell","CentX","CentY","CountLabels","AvgGreen","DeviationsGreen","AvgRed","DeviationsRed","GUI-ID","Slice")
I am not sure how to set up a loop to perform each of the commands to all 40 files and maintain distinct file names.
To quicly read in a lot of data frames you can
listy <- apply(data.frame(list.files()), 1, read.table, sep="", header=F)
Then to name them the list items, you can:
names(listy) <- paste0("file", seq_along(1:40))
They are then called by listy$file1 etc.
Thanks Metrics for editing my input code... I wasn't sure how to give it the format you did.
So I figured it out (dirty version), but it still needs work -
stem <- c("/Users/stefanzdraljevic/Northwestern/2013/Carthew-Rotation/Sample-Images-Ritika/ritika-tes/statistics3/test_63x_disc_z")
The above is the stem for the file naming`
addition <- c("00","01","02","03","04","05","06","07","08","09","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","34","35","36","37","38","39")
This will add the number of the text file to the end of stem. I am not sure how to incorporate the "00" numbering structure without writing them all out.
colnames <- c("cell","CentX","CentY","CountLabels","AvgGreen","DeviationsGreen","AvgRed","DeviationsRed","GUI-ID","Slice")
This will add column names to the data.frame
data = NULL
for(j in stem){
data[[j]] = NULL
for(i in addition){
data[[j]][[i]] = read.table(paste(j,i,".txt",sep=""), header=F, col.names=colnames)
}
}
This loop does the trick.

Select columns for heatmap in R

I need your help again :)
I wrote an R script, that generates a heatmap out of a given tab-seperated txt or xls file. At the moment, I delete all columns I don't want to have in the heatmap by hand in the xls file.
Now I want to automatize it, but I don't know how :(
The interesting columns all start the same in all xls files, followed by an individual name:
xls-file 1: L1_tpm_xxxx L2_tpm_xxxx L3_tpm_xxxx
xls-file 2: L1_tpm_xxxx L2_tpm_xxxx L3_tpm_xxxx L4_tpm_xxxx L5_tpm_xxxx
Any ideas how to select those columns?
Thanking you in anticipation, Philipp
You could use (if you have read your data in a data.frame df):
df <- df[,grep("^L[[:digit:]]+_tpm.*",colnames(df))]
or you can explicitly write the columns that you want:
df <- df[,c("L1_tpm_xxxx","L2_tpm_xxxx","L3_tpm_xxxx")]
etc...
The following link is quite useful;-)
If you think the column positions are going to be fixed across excel sheets, the simplest solution here is to just use column indices. For example, if you use read.table to import a tab-delimited text file as a data.frame, and then decide you'd prefer to only keep the first two columns, you might do something like this:
data <- read.table("path_to_file.txt", header=T, sep="\t")
data <- data[,1:2]

Resources