For-loop error - values not being entered into column - r

I have a dataframe with one column containing a bunch of sound file filenames (.wav). I would like to measure something within each sound file, and have each measurement listed in a second new column next to the corresponding filename. In my code, each measurement is not being put into the second column. If I take each line of the for-loop and run it independently with the values 1, 2, 3 etc substituted for i, the resulting dataframe has the output measurements correctly entered.
I have written a toy example below, which cannot be run without some wav files, but perhaps the problem can be spotted based on the code alone:
library(seewave); library(tuneR)
setwd("D:/wavs")
#make a dataframe containing a column of wav filenames
file_list <-data.frame(c("wav1.wav", "wav2.wav", "wav3.wav"))
colnames(file_list)[1] <-"filelist" #give it a sensible column name
file_list$filelist <-as.character(file_list$filelist) #convert from factor
file_list$mx <-NA #a new empty vector for the measurement results
str(file_list) #this is how it will look before adding measurements
#now read in each wav file, measure something,
#and put the outcome in the empty cell next to that corresponding filename.
for (i in length(file_list$filelist)){
a <-as.character(file_list$filelist[[i]]) #this seemed to be a requirement, that 'wav1.wav' etc be a character
temp <-readWave(a) #read the file using tuneR package
mx <-max(range(temp#left)) #take some measurement from the left channel
file_list$mx[[i]] <-mx #put it in a new column next to the original filename
rm(mx); rm(temp); rm(a) #kill unnecessary things before starting again, for just in case
}
Needless to say, I have scoured the web and Stackoverflow for guidance without success, and tried a bunch of things (e.g. using }next{). Perhaps I need something similar to: file_list2$mx[i,] <-mx. Maybe some easy points for someone? Thank you, I am always grateful for help on SO.

The only problem with your code is that, You have not iterated the for loop. Means as per your code the for loop runs only one time i.e i=3. The correct code given below:
for (i in 1:length(file_list$filelist)){
a <-as.character(file_list$filelist[[i]]) #this seemed to be a requirement, that 'wav1.wav' etc be a character
temp <-readWave(a) #read the file using tuneR package
mx <-max(range(temp#left)) #take some measurement from the left channel
file_list$mx[[i]] <-mx #put it in a new column next to the original filename
rm(mx); rm(temp); rm(a) #kill unnecessary things before starting again, for just in case
}
Hope this could help you !

How about a tidyverse solution?
library(dplyr)
library(purrr)
data.frame(filelist=c("wav1.wav", "wav2.wav", "wav3.wav"), stringsAsFactors = F) %>%
mutate(mx = map_dbl(filelist, ~ readWave(.x)#left %>% max))

Related

Writing For Loop or Split function to separate data from Master data frame into smaller data frames

I am once again asking for your help and guidance! Super duper novice here so I apologize in advance for not explaining things properly or my general lack of knowledge for something that feels like it should be easy to do.
I have sets of compounds in one "master" list that need to be separated into smaller list. I want to be able to do this with a "for loop" or some iterative function so I am not changing the numbers for each list. I want to separate the compounds based off of the column "Run.Number" (there are 21 Run.Numbers)
Step 1: Load the programs needed and open File containing "Master List"
# tMSMS List separation
#Load library packages
library(ggplot2)
library(reshape)
library(readr) #loading the csv's
library(dplyr) #data manipulation
library(magrittr) #forward pipe
library(openxlsx) #open excel sheets
library(Rcpp) #got this from an error code while trying to open excel sheets
#STEP 1: open file
S1_MasterList<- read.xlsx("/Users/owner/Documents/Research/Yurok/Bioassay/Bioassay Data/220410_tMSMS_neg_R.xlsx")
Step 2: Currently, to go through each list, I have to change the "i" value for each iteration. And I also must change the name manually (Ctrl+F), by replacing "S2_Export_1" with "S2_Export_2" and so on as I move from list to list. Also, when making the smaller list, there are a handful of columns containing data that need to be removed from the “Master List”. The specific format of column names are so it will be compatible with LC-MS software. This list is saved as a .csv file, again for compatibility with LC-MS software
#STEP 2: Iterative
#Replace: S2_Export_1
i=1
(S2_Separate<- S1_MasterList[which(S1_MasterList$Run.Number == i), ])
%>%
(S2_Export_1<-data.frame(S2_Separate$On,
S2_Separate$`Prec..m/z`,
S2_Separate$Z,
S2_Separate$`Ret..Time.(min)`,
S2_Separate$`Delta.Ret..Time.(min)`,
S2_Separate$Iso..Width,
S2_Separate$Collision.Energy))
(colnames(S2_Export_1)<-c("On", "Prec. m/z", "Z","Ret. Time (min)", "Delta Ret. Time (min)", "Iso. Width", "Collision Energy"))
(write.csv(S2_Export_1, "/Users/owner/Documents/Research/Yurok/Bioassay/Bioassay Data/Runs/220425_neg_S2_Export_1.csv", row.names = FALSE))
Results: The output should look like this image provided below, and for this one particular data frame called "Master List", there should be 21 smaller data frames. I also want the data frames to be named S2_Export_1, S2_Export_2, S2_Export_3, S2_Export_4, etc.
First, select only required columns (consider processing/renaming non-syntactic names first to avoid extra work downstream):
s1_sub <- select(S1_MasterList, Sample.Number, On, `Prec..m/z`, Z,
`Ret..Time.(min)`, `Delta.Ret..Time.(min)`,
Iso..Width, Collision.Energy)
Then split s1_sub into a list of dataframes with split()
s1_split <- split(s1_sub, s1_sub$Sample.Number)
Finally, name the resulting list of dataframes with setNames():
s1_split <- setNames(s1_split, paste0("S2_export_", seq_along(s1_split))

R- how to fix memory issues in R?

In all honesty, I'm not quite sure what the issue is but I've had a similar issue in R in the past. I've written code to extract the variables I want from .dat files (specifically the current panel survey). I have CSV files that contain the positions of each variable by year (positions change by year). For example, HRFS12M1 in 2010 is 1173-1174, and in 2019 1223-1224 (and this part of the code is not shown but works so I didn't include it). So I have two folders and two separate directories one with the positions and one with the .dat files. I first loop through the positions files and create dfs with positions for each year (2010-2019). After the position dfs are generated I run the code below to obtain what variables I want in a large merged df. Now the code works as intended when I select 4 or fewer variables in the varList. However, the moment I try to use more variables the df starts to produce values that aren't within those columns. Does anyone know why it's doing this? I've tried several different variables to confirm it's not a problem with the position files but a problem with the number of variables.
#Loop through list of .dat files (lst2 contains name of files example:2010dec.dat)
for(i in 1:length(lst2)){
#Import the data cps data set
temp_cps<-readLines(lst2[i])
#Get the positions of the relevant year
temp_pos<- get(paste("Year", i, sep = "."))
#List of Variables we are looking at (can't use more than 4)
**varList=c("HRYEAR4","GESTFIPS","HESP1","HRFS12M1")**
#Get positions only for the variables selected
temp_pos=temp_pos[grep(paste(varList, collapse="|"), temp_pos$Variable),]
#Create the dataframe
df<-NULL
for(j in 1:length(varList)){
df<-cbind(df,substr(temp_cps,temp_pos$Pos1[j],temp_pos$Pos2[j]))
}
df<-as.data.frame(df)
names(df)<-varList
assign(paste("CPS", i, sep = "."), df)
}
#AutoMate appending each year
for (k in 1:(length(lst2)-1)){
if(k==1){
CPS1 <- get(paste("CPS", k, sep = "."))
CPS2 <- get(paste("CPS", k+1, sep = "."))
#Append to keep only rows of second data set
merged_data=rbind(CPS1,CPS2)
}
else{
CPS_C <- get(paste("CPS", k+1, sep = "."))
merged_data=merged_data=rbind(merged_data,CPS_C)
}
if(k==length(lst2)-1){
#Clear Console
rm(list=setdiff(ls(), "merged_data"))
}
}
This is what it looks like before it breaks
This what happens after adding more than 4 variables
I think I figured it out. Need to run a few extra variables to confirm. But the program currently won't work if my list of variables is not in order in terms of position. For example, if "HRYEAR4" is 82-84 and "GESTFIPS" is 93-94 then the program will fail if I put GETSFIPS before HRYEAR4 in varList. However, if HRYEAR comes first then the program will run as intended. Does anyone, have any quick idea how to replace this line df<-cbind(df,substr(temp_cps,temp_pos$Pos1[j],temp_pos$Pos2[j])) to make it more dynamic and not have this issue? If not, it's not a big deal for the moment I'll just put them in order and see if I can find a better solution in the future. Thanks to anyone who tried to help.

A cell in a CSV is (wrongly) read as a character vector of length 2 in R

I have a data frame like this I read in from a .csv (or .xlsx, I've tried both), and one of the variables in the data frame is a vector of dates.
Generate the data with this
Name <- rep("Date", 15)
num <- seq(1:15)
Name <- paste(Name, num, sep = "_")
data1 <- data.frame(
Name,
Due.Date = seq(as.Date("2020/09/24", origin = "1900-01-01"),
as.Date("2020/10/08", origin = "1900-01-01"), "days")
)
When I reference one of the cells specifically, like this: str(project_dates$Due.Date[241]) it reads the date as normal.
However, the exact position of the important dates varies from project to project, so I wrote a command that identifies where the important dates are in the sheet, like this: str(project_dates[str_detect(project_dates$Name, "Date_17"), "Due.Date"])
This code worked on a few projects, but on the current project it now returns a character vector of length 2. One of the values is the date, and the other value is NA. And to make matters worse, the location of the date and the NA is not fixed across dates--the date is the first value in some cells and the second in others (otherwise I would just reference, e.g., the first item in the vector).
What is going on here, but more importantly, how do I fix this?!
Clarification on the second command:
When I was originally reading from an Excel file, the command was project_dates[str_detect(project_dates$Name, "Date_17"), "Due.Date"]$Due.Date because it was returning a 1x1 tibble, and I needed the value in the tibble.
When I switched to reading in data as a csv, I had to remove the $Due.Date because the command was now reading the value as an atomic vector, so the $ operator was no longer valid.
Help me, Oh Blessed 1's (with) Knowledge! You're my only hope!
Edited to include an image of the data like the one that generates the error
I feel sheepish.
I was able to remove the NAs with
data1<- data1[!is.na(data1$Due.Date), ].
I assumed that command would listwise delete the rows with any missing values, so if the cell contained the 2-length vector, then I would lose the whole row of data. Instead, it removed the NA from the cell, leaving only the date.
Thank you to everyone who commented and offered help!

Vectorise an imported variable in R

I have imported a CSV file to R but now I would like to extract a variable into a vector and analyse it separately. Could you please tell me how I could do that?
I know that the summary() function gives a rough idea but I would like to learn more.
I apologise if this is a trivial question but I have watched a number of tutorial videos and have not seen that anywhere.
Read data into data frame using read.csv. Get names of data frame. They should be the names of the CSV columns unless you've done something wrong. Use dollar-notation to get vectors by name. Try reading some tutorials instead of watching videos, then you can try stuff out.
d = read.csv("foo.csv")
names(d)
v = d$whatever # for example
hist(v) # for example
This is totally trivial stuff.
I assume you have use the read.csv() or the read.table() function to import your data in R. (You can have help directly in R with ? e.g. ?read.csv
So normally, you have a data.frame. And if you check the documentation the data.frame is described as a "[...]tightly coupled collections of variables which share many of the properties of matrices and of lists[...]"
So basically you can already handle your data as vector.
A quick research on SO gave back this two posts among others:
Converting a dataframe to a vector (by rows) and
Extract Column from data.frame as a Vector
And I am sure they are more relevant ones. Try some good tutorials on R (videos are not so formative in this case).
There is a ton of good ones on the Internet, e.g:
* http://www.introductoryr.co.uk/R_Resources_for_Beginners.html (which lists some)
or
* http://tryr.codeschool.com/
Anyways, one way to deal with your csv would be:
#import the data to R as a data.frame
mydata = read.csv(file="SomeFile.csv", header = TRUE, sep = ",",
quote = "\"",dec = ".", fill = TRUE, comment.char = "")
#extract a column to a vector
firstColumn = mydata$col1 # extract the column named "col1" of mydata to a vector
#This previous line is equivalent to:
firstColumn = mydata[,"col1"]
#extract a row to a vector
firstline = mydata[1,] #extract the first row of mydata to a vector
Edit: In some cases[1], you might need to coerce the data in a vector by applying functions such as as.numeric or as.character:
firstline=as.numeric(mydata[1,])#extract the first row of mydata to a vector
#Note: the entire row *has to be* numeric or compatible with that class
[1] e.g. it happened to me when I wanted to extract a row of a data.frame inside a nested function

Using cbind to create a columnar .csv file but 'X' always appears in row one

I have a script that is working perfectly except that in my R cbind operation, adjacent to the numerical value that I require in the first row, is an 'X'.
Here is my script:
library(ncdf)
library(Kendall)
library(forecast)
library(zoo)
setwd("/home/cohara/RainfallData")
files=list.files(pattern="*.nc")
j=81
for (i in seq(1,9))
{
file<-open.ncdf(sprintf("/home/cohara/RainfallData/%s.nc",i))
year<-get.var.ncdf(file,"time")
data<-get.var.ncdf(file,"var61")
fit<-lm(data~year) #least sqaures regression
mean=rollmean(data,4,fill=NA)
kendall<-Kendall(data,year)
write.table(kendall[[2]],file="/home/cohara/RainfallAnalysis/Kendall_p-value_for_10%_increase_over_81_-_89_years.csv",append=TRUE,quote=FALSE,row.names=FALSE,col.names=FALSE)
write.table(kendall[[1]],file="/home/cohara/RainfallAnalysis/Kendall_tau_for_10%_increase_over_81_-_89_years.csv",append=TRUE,quote=FALSE,row.names=FALSE,col.names=FALSE)
png(sprintf("./10 percent increase over %s years.png",j))
par(family="serif",mar=c(4,6,4,1),oma=c(1,1,1,1))
plot(year,data,pch="*",col=4,ylab="Precipitation (mm)",main=(sprintf("10 percent increase over %s years",j)),cex.lab=1.5,cex.main=2,ylim=c(800,1400),abline(fit,col="red",lty=1.5))
par(new=T)
plot(year,mean,type="l",xlab="year",ylab="Precipitation (mm)",cex.lab=1.5,ylim=c(800,1400),lty=1.5)
legend("bottomright",legend=c("Kendall tau = ",kendall[[1]]))
legend("bottomleft",legend=c("Kendall 2-tailed p-value = ",kendall[[2]]))
legend(x="topright",c("4 year moving average","Simple linear trend"),lty=1.5,col=c("black","red"),cex=1.2)
legend("topleft",c("Annual total"),pch="*",col="blue",cex=1.2)
dev.off()
j=j+1
}
tmp<-read.csv("/home/cohara/RainfallAnalysis/Kendall_p-value_for_10%_increase_over_81_to_89_years.csv")
tmp2<-read.csv("/home/cohara/RainfallAnalysis/Kendall_p-value_for_10%_increase_over_81_-_89_years.csv")
tmp<-cbind(tmp,tmp2)
tmp3<-read.csv("/home/cohara/RainfallAnalysis/Kendall_tau_for_10%_increase_over_81_to_89_years.csv")
tmp4<-read.csv("/home/cohara/RainfallAnalysis/Kendall_tau_for_10%_increase_over_81_-_89_years.csv")
tmp3<-cbind(tmp3,tmp4)
write.table(tmp,"/home/cohara/RainfallAnalysis/Kendall_p-value_for_10%_increase_over_81_to_89_years.csv",sep="\t",row.names=FALSE)
write.table(tmp3,"/home/cohara/RainfallAnalysis/Kendall_tau_for_10%_increase_over_81_to_89_years.csv",sep="\t",row.names=FALSE)
The output looks like this, from the .csv files created:
X0.0190228056162596 X0.000701081415172666
0.0395622998 0.00531819
0.0126547674 0.0108218994
0.0077754743 0.0015568719
0.0001407317 0.002680057
0.0096391216 0.012719159
0.0107234037 0.0092436085
0.0503448173 0.0103918528
0.0167525802 0.0025036721
I want to be able to use excel functions on the data, so, for simplicity, I don't want row names (I'll be running this loop maybe a hundred times), but I need column names because otherwise the first set of values is cut off.
Can anyone tell me where the 'X' is coming from and how to get rid of it?
Thanks in advance,
Ciara
Here is what I think is going on. Start by running these small examples:
df1 <- read.csv(text = "0.0190228056162596, 0.000701081415172666
0.0395622998, 0.00531819
0.0126547674, 0.0108218994")
df2 <- read.csv(text = "0.0190228056162596, 0.000701081415172666
0.0395622998, 0.00531819
0.0126547674, 0.0108218994", header = FALSE)
df1
df2
str(df1)
str(df2)
names(df1)
names(df2)
make.names(c(0.0190228056162596, 0.000701081415172666))
Please read ?read.csv and about the header argument. As you will find, header = TRUE is default in read.csv. Thus, if the csv file you read lacks header, read.csv will still 'assume' that the file has a header, and use the values in the first row as a header. Another argument in read.csv is check.names, which defaults to TRUE:
If TRUE then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names. If necessary they are adjusted (by make.names).
In your case, it seems that the data you read lack a header and that the first row is numbers only. read.csv will default treat this row as a header. make.names takes values in the first row (here numbers 0.0190228056162596, 0.000701081415172666), and spits out the 'syntactically valid variable names' X0.0190228056162596 and X0.000701081415172666. Which is not what you want.
Thus, you need to explicitly set header = FALSE to avoid that read.csvconvert the first row to (valid) variable names.
For next time, please provide a minimal, self contained example. Check these links for general ideas, and how to do it in R: here, here, here, and here

Resources