I'm still a rookie to the R world, in a very accelerated class with limited/no guidance. My assignment is to build a custom function that reads in a specific .csv, and take some specific columns out to be analyzed. Could anyone please offer some advice? The "sample code" I was given looks like this:
AnnualLekSurvey=function(data.in,stat.year){
d1=subset(data.in,year==stat.year)
d2=d1[c("year","complex","tot_male")]
attach(d2)}
So when it's complete and I run it, I should be able to say:
AnnualLekSurvey(gsg_lek,2006)
where "gsg_lek" is the name of the file I want to import, and 2006 is the values from the "year" column that I want to subset. "complex" and "tot_male" will be the variable to be analyzed by "year", but I'm not worried about that code right now.
What I'm confused about is; how do I tell R that gsg_lek is a .csv file, and tell it to look in the proper directory for it when I run the custom function?
I saw one other vaguely similar example on here, and they had to use the if() and paste() commands to build the string of the file name - that seems like too much arbitrary work, unless I'm just being lazy...
Any help would be appreciated.
You can make a function like this:
AnnualLekSurvey <- function(csvFile, stat.year)
{
d1 <- read.csv(paste("C:/",csvFile,".csv", sep=""),header=T, sep=",")
d2 <- subset(d1, year==stat.year)
d2 <- d2[, c("year","complex","tot_male")]
return(d2)
}
The argument 'csvFile' in the function is the basename of your csv file. In this particular example, this has to be in your C:/ folder. If your file is in some other folder, you have to change the "C:/" in the function to the folder where your csv file is located.
Running the function:
data <- AnnualLekSurvey("gsg_lek", "2006")
Note that the arguments has to be within the quotes. 'data' will now contain the columns year, complex and tot_male of gsg_lek.csv corresponding to the year 2006
Related
In order to conduct some analysis using a particular software, I am required to have separate ".dat" files for each participant, with each file named as the participant number, all saved in one directory.
I have tried to do this using the "write.dat" function in R (from the 'multiplex' package).
I have written a loop that outputs a ".dat" file for each participant in a dataset. I would like each file that is outputted to be named the participant number, and for them all to be stored in the same folder.
## Using write.dat
participants_ID <- unique(newdata$SJNB)
for (i in 1:length(participants_ID)) {
data_list[[i]] <- newdata %>%
filter(SJNB == participants_ID[i])
write.dat(data_list[[i]], paste0("/Filepath/Directory/", participants_ID[i], ".dat"))
}
## Using write_csv this works perfectly:
participants_ID <- unique(newdata$SJNB)
for (i in 1:length(participants_ID)) {
newdata %>%
filter(SJNB == participants_ID[i]) %>%
write_csv(paste0("/Filepath/Directory/", participants_ID[i], ".csv"), append = FALSE)
}
If I use the function "write_csv", this works perfectly (saving .csv files for each participant). However, if I use the function "write.dat" each participant file is saved inside a separate folder - the folder name is the participant number, and the file inside the folder is called "data_list[[i]]". In order to get all of the data_list files into the same directory, I then have to rename them which is time consuming.
I could theoretically output the files to .csv and then convert them to .dat, but I'm just intrigued to know if there's anything I could do differently to get the write.dat function to work the way I'm trying it :)
The documentation on write.dat is subminimal, but it would appear that you have confused a directory path with a file name . You have deliberately created a directory named "/Filepath/Directory/[participants_ID[i]].dat" and that's where each output file is placed. That you cannot assing a name to the x.dat file itself appears to be a defect in the package as supplied.
However, not all is lost. Inside your loop, replace your write.dat line with the following lines, or something similar (not tested):
edit
It occurs to me that there's a smoother solution, albeit using the dreaded eval:
Again inside the loop, (assuming participants_ID[i] is a char string)
eval(paste0(participants_ID[i],'<- dataList[[i]]'))
write.dat(participants_ID[i], "/Filepath/Directory/")
previous answer
write.dat(data_list[[i]], "/Filepath/Directory/")
thecommand = paste0('mv /Filepath/Directory/dataList[[i]] /Filepath/Directory/',[participants_ID[i]],'.dat',collapse="")
system(thecommand)
I am trying to remove bias from a microscopy analysis, so I want to make it so the experimenter doesn't know what the conditions are for the image they are looking at.
To do this I need to rename every file in a directory so they can't be identified, but I also need to be able to know what the original filename was subsequently.
I made a folder with three files in it to try this out. I got the file list and made a vector for the new names, and combined into a data frame .
setwd("~/Desktop/folder1")
filename_list<-list.files("~/Desktop/folder1")
new_filenames <- c("anon1", "anon2", "anon3")
require(reshape2)
df1 <- melt(data.frame(filename_list,new_filenames))
View(df1)
I've also been able to change names using scripts from a previous question
and r bloggers using sapply and file.rename. I got a little stuck with using wildcards in this to select the whole filename (minus extension) but i'm sure it's possible;
sapply(filename_list,FUN=function(eachPath){file.rename(from=eachPath,to=sub(pattern="image_",replacement="anon",eachPath))})
How I can get the new_filenames vector and apply it to file.rename so it corresponds to the original_filenames vector in the df1 data frame,
or is there a better way to do this? Thanks.
I'm new to R and programming and taking a Coursera course. I've asked in their forums, but nobody can seem to provide an answer in the forums. To be clear, I'm trying to determine why this does not output.
When I first wrote the program, I was getting accurate outputs, but after I tried to upload, something went wonky. Rather than producing any output with [1], [2], etc. when I run the program from RStudio, I only get the the blue +++, but no errors and anything I change still does not produce an output.
I tried with a previous version of R, and reinstalled the most recent version 3.2.1 for Windows.
What I've done:
Set the correct working directory through RStudio
pol <- function(directory, pol, id = 1:332) {
files <- list.files("specdata", full.names = TRUE);
data <- data.frame();
for (i in ID) {
data <- rbind(data, read.csv(files_list[i]))
}
subset <- subset(data, ID %in% id);
polmean <- mean(subset[pol], na.rm = TRUE);
polmean("specdata", "sulfate", 1:10)
polmean("specdata", "nitrate", 70:72)
polmean("specdata", "nitrate", 23)
}
Can someone please provide some direction - debug help?
when I adjust the code the following errors tend to appear:
ID not found
Missing or unexpected } (although I've matched them all).
The updated code is as follow, if I'm understanding:
data <- data.frame();
files <- files[grepl(".csv",files)]
pollutantmean <- function(directory, pollutant, id = 1:332) {
pollutantmean <- mean(subset1[[pollutant]], na.rm = TRUE);
}
Looks like you haven't declared what ID is (I assume: a vector of numbers)?
Also, using 'subset' as a variable name while it's also a function, and pol as both a function name and the name of one of the arguments of that same function is just asking for trouble...
And I think there is a missing ")" in your for-loop.
EDIT
So the way I understand it now, you want to do a couple of things.
Read in a bunch of files, which you'll use multiple times without changing them.
Get some mean value out of those files, under different conditions.
Here's how I would do it.
Since you only want to read in the data once, you don't really need a function to do this (you can have one, but I think it's overkill for now). You correctly have code that makes a vector with the file names, and then loop over over them, rbinding them to each other. The problem is that this can become very slow. Check here. Make sure your directory only contains files that you want to read in, so no Rscripts or other stuff. A way (not 100% foolproof) to do this is using files <- files[grepl(".csv",files)], which makes sure you only have the csv's (grepl checks whether a certain string is a substring of another, and returns a boolean the [] then only keeps the elements for which a TRUE was returned).
Next, there is 'a thing you want to do multiple times', namely getting out mean values. This is where you'd use a function. Apparently you want to get the mean for different types of pollution, and you want this in restricted IDs.
Let's assume that 1. has given you a dataframe df with a column named Type for the type of pollution and a column called Id that somehow represents a sort of ID (substitute with the actual names in your script - if you don't have a column for ID, I'll edit the answer later on). Now you want a function
polmean <- function(type, id) {
# some code that returns the mean of a restricted version of df
}
This is all you need. You write the code that generates df, you then write a function that will get you what you want from that dataframe, and then you call it for the circumstances you want to use it in (the three polmean calls at the end of your original code, but now without the first argument as you no longer need this).
Ok - I finally solved this. Thanks for the help.
I didn't need to call "specdata" in line 2. the directory in line 1 referred to the correct directory.
My for/in statement needed to refer the the id in the first line not the ID in the dataset. The for/in statement doesn't appear to need to be indented (but it looks cleaner)
I did not need a subset
The last 3 lines for pollutantmean did not need to be a part of the program. These are used in the R console to call the results one by one.
Thank you in advance for your're help. I am using R to analyse some data that is initially created in Matlab. I am using the package "R.Matlab" and it is fantastic for 1 file, but I am struggling to import multiple files.
The working script for a single file is as follows...
install.packages("R.matlab")
library(R.matlab)
x<-("folder_of_files")
path <- system.file("/home/ashley/Desktop/Save/2D Stream", package="R.matlab")
pathname <- file.path(x, "Test0000.mat")
data1 <- readMat(pathname)
And this works fantastic. The format of my files is 'Name_0000.mat' where between files the name is a constant and the 4 digits increase, but not necesserally by 1.
My attempt to load multiple files at once was along these lines...
for (i in 1:length(temp))
data1<-list()
{data1[[i]] <- readMat((get(paste(temp[i]))))}
And also in multiple other ways that included and excluded path and pathname from the loop, all of which give me the same error:
Error in get(paste(temp[i])) :
object 'Test0825.mat' not found
Where 0825 is my final file name. If you change the length of the loop it is always just the name of the final one.
I think the issue is that when it pastes the name it looks for that object, which as of yet does not exist so I need to have the pasted text in speach marks, yet I dont know how to do that.
Sorry this was such a long post....Many thanks
guys, thanks for read this. This is my first time writing a program so pardon me if I make stupid questions.
I have bunch of .csv files named like: 001-XXX.csv;002-XXX.csv...150-XXX.csv. Here XXX is a very long name tag. So it's a little annoying that every time I need to type read.csv("001-xxx.csv"). I want to make a function called "newread" that only ask me for the first three digits, the real id number, to read the .csv files. I thought "newread" should be like this:
newread <- function(id){
as.character(id)
a <- paste(id,"-XXX.csv",sep="")
read.csv(a)
}
BUt R shows Error: unexpected '}' in "}" What's going wrong? It looks logical.
I am running Rstudio on Windows 8.
as.character(id) will not change id into a character string. Change it to:
id = as.character(id)
Edit: According to comments, you should call newread() with a character paramter, and there is no difference between newread(001) and newread(1).
This is not specifically an answer to your question (others have covered that), but rather some advice that may be helpful for accomplishing your task in a different way.
First, some of the GUI's for R have file name completion. You can type the first part: read.csv("001- and then hit a key or combination of keys (In the windows GUI you press TAB) and the rest of the filename will be filled in for you (as long as it is unique).
You can use the file.choose or choose.files functions to open a dialog box to choose your file using the mouse: read.csv(file.choose()).
If you want to read in all the above files then you can do this in one step using lapply and either sprintf or list.files (or others):
mycsvlist <- lapply( 1:150, function(x) read.csv( sprintf("%03d-XXX.csv", x) ) )
or
mvcsvlist <- lapply( list.files(pattern="\\.csv$"), read.csv )
You could also use list.files to get a list of all the files matching a pattern and then pass one of the returned values to read.csv:
tmp <- list.files(pattern="001.*csv$")
read.csv(tmp[1])