try to create new variable using loop in R,but failed - r

I am a new user to R.I have already imported all data from all my txt file using the code down below,but i want to create a new variable when importing data,the variable is called case.The value of case for the first row is 1 and for the rest is 0.
And when i try to run the code,the console did not say anytime wrong ,the data has been imported, but the new variable wasn't created.I don't know why.
for(i in Filenames){
perpos <- which(strsplit(i, "")[[1]]==".")
data=assign(
gsub(" ","",substr(i, 1, perpos-1)),
read.table(paste(filepath,i,sep=""),fill=TRUE,header=TRUE,quote ="",row.names = NULL,sep="\t")
)
strsplit(i, "")
filename = strsplit(as.character(i),"\\.txt")
data$case = ifelse(data$NAME=="filename",1,0)
}

Thanks guys! I used #joosts's code and made some ajustment. The code down below works just fine.
fn <- paste(filepath,Filenames,sep="")
mylist <- lapply(fn, read.table,fill = TRUE, header = TRUE, quote = "",row.names = NULL, sep = "\t",stringsAsFactors=FALSE)
for(i in 1:length(Filenames)){
mylist[[i]]<- cbind(mylist[[i]], case = 0)
if(nrow(mylist[[i]])>0) {
mylist[[i]]$case[1] <- 1
}
mylist[[i]]<- cbind(mylist[[i]], ID = i)
}
do.call(rbind, mylist)

I am assuming you want to read in multiple text files, with each file containing the same columns (in the same order). In order to combine multiple dataframes (the things that result from calling read.data()), you should call the function rbind().
And I assume your code to get a filename without the extension is slightly overcomplex...
for(file in filenames) {
sanitized_filename <- gsub(" ", "", strsplit(file, "\\.")[[1]][1])
file.frame <- read.table(paste(filepath, file, sep=""), fill = TRUE, header = TRUE, quote = "", row.names = NULL, sep = "\t")
file.frame <- cbind(file.frame, name = I(sanitized_filename), case = 0)
if(nrow(file.frame)>0) {
file.frame$case[1] <- 1
}
data <- ifelse(exists("data"), rbind(data, file.frame), file.frame)
}

Related

How to save a value that's generated and used in an r function to keep it in the environment?

I have a function (Save.R) that creates a few variables and saves them in a table for further use.
I also have a matrix in my main code that I want to replace some of its cells with a FileName that is generated in the function.
Question: how do I keep FileName and save it to my environment?
*I'm new to R please explain in simple words.
I have tried to input my matrix as an input to Save.R and replace cells as it generates the FileName(s) but it does not work.
for (i in 1:435){
X = subset(NGAW2_Flatfile_Vertical_5percentdamping, grepl(Uniques[i,1],
NGAW2_Flatfile_Vertical_5percentdamping$`Station ID No.`))
if (nrow(X)==1){
# Match[count,] = subset(NGAW2_Flatfile_Vertical_5percentdamping, grepl(Uniques[i,1], NGAW2_Flatfile_Vertical_5percentdamping$`Station ID No.`))
Match[count,] = X[1,]
H1 = substring(X[1,113], 10,15)
H2 = substring(X[1,114], 10,15)
V = substring(X[1,115], 10,15)
St.ID = substring(X[1,9], 1, 7)
Save(H1, H2, V, Match)
count=count+1
}
}
Save <- function(H1, H2, V){
H1 = paste(H1, ".DAT", sep = "")
data = read.delim(H1, sep = "", header = FALSE)
When1 = substring(data[2,1],2,11)
FileName1 = paste("20", When1, "_", St.ID, "_", "H1", sep = "" )
}

adding to lists together using cbind

This program works because I made the varibles inisde lapply global by using the <<- operator. However, it does not work with the real files in the real program. These are .tsv files whith named columns. The answer I get when I run the real program is: Error: (converted from warning) Error in : (converted from warning) Error in : arguments imply differing number of rows: 3455, 4319. What might be causing this?
lc <- list("test.txt", "test.txt", "test.txt", "test.txt")
lc1 <- list("test.txt", "test.txt", "test.txt")
lc2 <- list("test.txt", "test.txt")
#list of lists. The lists contain file names
lc <- list(lc, lc1, lc2)
#new names for the three lists in the list of lists
new_dataFns <- list("name1", "name2", "name3")
file_paths <- NULL
new_path <- NULL
#add the file names to the path and read and merge the contents of each list in the list of lists
lapply(
lc,
function(lc) {
filenames <- file.path(getwd(), lc)
dataList <<- lapply(filenames, function (lc) read.table(file=lc, header=TRUE))
dataList <<- lapply(dataList, function(dataList) {merge(as.data.frame(dataList),as.data.frame(dataList))})
}
)
#add the new name of the file to the path total will be 3 paths/fille_newname.tsv.
lapply(new_dataFns, function(new_dataFns) {new_path <<- file.path(getwd(), new_dataFns)})
print(new_path)
print(dataList)
finalFiles <- merge(as.data.frame(dataList), as.data.frame(new_path))
print(finalFiles)
I found a solution to the problem by writing a different type of code. Please see below. The input to the function is provided by the app input widgets
glyCount1 <- function(answer = NULL, fileChoice = NULL, combination = NULL, enteredValue = NULL, nameList) {
lc = nameList
new_dataFns <- gsub(" ", "", nameList)
first_path <- NULL
new_path <- NULL
old_path <- NULL
file_content <- NULL
for(i in 1:length(lc)){
for(j in 1:length(lc[[i]])){
if(!is.null(lc[[i]])){
first_path[[j]]<- paste(getwd(), "/", lc[[i]][j], sep = "")
tryCatch(file_content[[j]] <- read.csv(file = first_path[[i]], header = TRUE, sep = ","), error = function(e) NULL)
old_path[[j]] <- paste(getwd(), "/", i, ".csv", sep = "")
write.table(file_content[[j]], file = old_path[[j]], append = TRUE, col.names = FALSE)
}
}
}
}

For-loop in R to create a new file (but gives incorrect/unexpected output)

I'm currently busy with some data and I need to check their validity.
Therefore, I would like to use a for-loop to go through all my data files.
In this for-loop, I would like to calculate some things (like mean, min,max...).
My code below works but produced an incorrectly written csv file. The problem occurs after the calculations (and their values) are done during csv file creation. CSV:
"c.1..1..1004.89081855716..630.174466667434..461.738905906677.." "c.1..1..950.990843858612..479.98560814955..517.955102920532.."
1 1
1 1
1004.89081855716 950.990843858612
630.174466667434 479.98560814955
461.738905906677 517.955102920532
1535.86795806885 1452.30199813843
-13.3948961645365 3.72026950120926
1259.26423788071 1159.17089223862
Approach/What I'm expecting:
So I start from some data files with eye tracking data in it.
As you can see at the beginning of the code, I try to get some values out of this eye tracking data (validity, new file with only validity == 1 data...). Once I created the filtered_data dataframe, I want to calculate some extra values out of it (mean, sd, min/max).
My plan is to create a new csv file (validity_loop.csv) in which I can find all my calculations (validity_left, validity_right,mean_eye_x, mean_eye_y, min_eye_x,max_eye_x,min_eye_y,max_eye_y). All in a row. One row for each data set (file_list[i]).
Can someone help me in how to tackle and solve this issue?
Here is my code:
set <- setwd("/Users/Sarah/Documents")
file_list <- list.files(set, pattern = ".csv", all.files = TRUE)
validity_list <- data_list <- vector("list", "length" = length(file_list))
for(i in seq_along(file_list)){
filename = file_list[i]
#read files
data_frame = read.csv(filename, sep = ",", dec = ".",
header = TRUE,
stringsAsFactors = FALSE)
#what has to be done
#validity
validity_left <- mean(is.numeric(data_frame$left_gaze_point_validity))
validity_right <-mean(is.numeric(data_frame$right_gaze_point_validity))
#Zuiver dataframe (validity ==1)
to_keep = which(data_frame$left_gaze_point_validity == 1 &
data_frame$right_gaze_point_validity==1)
filtered_data = data_frame[to_keep,]
filtered_data$left_eye_x = as.numeric(filtered_data$left_eye_x)
filtered_data$left_eye_y = as.numeric(filtered_data$left_eye_y)
filtered_data$right_eye_x = as.numeric(filtered_data$right_eye_x)
filtered_data$right_eye_y = as.numeric(filtered_data$right_eye_y)
#1 eye-data
filtered_data$eye_x <- (filtered_data$left_eye_x+filtered_data$right_eye_x)/2
filtered_data$eye_y <- (filtered_data$left_eye_y+filtered_data$right_eye_y)/2
#Pixels
filtered_data$eye_x <- (filtered_data$eye_x)*1920
filtered_data$eye_y <- (filtered_data$eye_y)*1080
#SD and Mean + min-max
mean_eye_x<- mean(filtered_data$eye_x)
mean_eye_y <- mean(filtered_data$eye_y)
sd_eye_x <- sd(filtered_data$eye_x)
sd_eye_y <- sd(filtered_data$eye_y)
min_eye_x <- min(filtered_data$eye_x)
min_eye_y <- min(filtered_data$eye_y)
max_eye_x <- max(filtered_data$eye_x)
max_eye_y <- max(filtered_data$eye_y)
#add everything to new file
validity_list[[i]] <- c(validity_left, validity_right,
mean_eye_x, mean_eye_y,
min_eye_x, min_eye_y,
max_eye_x, max_eye_y)
}
#new document
write.table(validity_list,
file = "Master T&O/Thesis /Loop/Validity/validity_loop.csv",
col.names = TRUE, row.names = FALSE)
I managed to get a new data frame in R, which contains the value of my validity_list as a matrix form.
#FOR LOOP poging 2
set <- setwd("/Users/Sarah/Documents/Master T&O/Thesis /Loop")
file_list <- list.files(set, pattern = ".csv", all.files = TRUE)
validity_list <- vector("list", "length" = length(file_list))
for(i in seq_along(file_list)){
filename = file_list[i]
#read files
data_frame = read.csv(filename, sep = ",", dec = ".", header = TRUE, stringsAsFactors = FALSE)
#what has to be done
#validity
validity_left <- mean(is.numeric(data_frame$left_gaze_point_validity))
validity_right <-mean(is.numeric(data_frame$right_gaze_point_validity))
#Zuiver dataframe (validity ==1)
to_keep = which(data_frame$left_gaze_point_validity == 1 & data_frame$right_gaze_point_validity==1)
filtered_data = data_frame[to_keep,]
filtered_data$left_eye_x = as.numeric(filtered_data$left_eye_x)
filtered_data$left_eye_y = as.numeric(filtered_data$left_eye_y)
filtered_data$right_eye_x = as.numeric(filtered_data$right_eye_x)
filtered_data$right_eye_y = as.numeric(filtered_data$right_eye_y)
#1 eye-data
filtered_data$eye_x <- (filtered_data$left_eye_x+filtered_data$right_eye_x)/2
filtered_data$eye_y <- (filtered_data$left_eye_y+filtered_data$right_eye_y)/2
#Pixels
filtered_data$eye_x <- (filtered_data$eye_x)*1920
filtered_data$eye_y <- (filtered_data$eye_y)*1080
#SD and Mean + min-max
mean_eye_x<- mean(filtered_data$eye_x)
mean_eye_y <- mean(filtered_data$eye_y)
sd_eye_x <- sd(filtered_data$eye_x)
sd_eye_y <- sd(filtered_data$eye_y)
min_eye_x <- min(filtered_data$eye_x)
min_eye_y <- min(filtered_data$eye_y)
max_eye_x <- max(filtered_data$eye_x)
max_eye_y <- max(filtered_data$eye_y)
#add everything to new file
validity_list[[i]] <- c(validity_left, validity_right,mean_eye_x, mean_eye_y, min_eye_x,max_eye_x,min_eye_y,max_eye_y)
validity_matrix <- matrix(unlist(validity_list), ncol = 8, byrow = TRUE)
}
#new document
write.table(validity_matrix, file = "/Users/Sarah/Documents/Master T&O/Thesis /Loop/Validity/validity_loop.csv", dec = ".")
The only problem I have now, is the fact that my values for the validity_list items are wrong, but that's another problem and I'm trying to fix it!
If I get it then the following line grabs all your data together:
validity_list[[i]] <- c (validity_left, validity_right,mean_eye_x,
mean_eye_y, min_eye_x,max_eye_x,min_eye_y,max_eye_y).
if it's like in python then I would have:
validity_list = (validity_left, validity_right,mean_eye_x,
mean_eye_y, min_eye_x,max_eye_x,min_eye_y,max_eye_y)
... whereas the '=' tell the interpreter that everything behind it is a tuple '(', data, ')' ...which makes it one single dataset and if I then write it... it would be end up in one column. If you do a pick using a for-loop I would get "validity_left" writing in a separate column. In your case adding this to your below code an option?
for item in validity_list:
function to process item..etc.

Removing new lines in R

I am trying to bring multiple rows into one cell in my CSV file. I first began with converting my text file into a CSV file, however the final column needs to have all the contents in one cell, and it's currently being split into multiple. The CSV File currently looks like the first picture, and needs to look like the second picture. Picture1Picture2
I have the following code:
mydata = read.table ("rolled_swiftmessage_test.txt", sep="|", allowEscapes
= TRUE, fill = FALSE)
write.table(mydata, file="rolled_swiftmessage_test.csv",sep=",",col.names=
FALSE,row.names= FALSE)
Currently it produces Picture_1, and I need it to produce picture_2. How do I fix it? Thanks!
After corresponding with the OP and seeing the kind of data that she has, this is my updated answer:
mydata <- read.table ("Test_TextFile.txt", sep="|", allowEscapes = TRUE, fill = FALSE, stringsAsFactors = F)
# Remove rows full of dashes
for(row in 1:nrow(mydata)) {
if(grepl('^\\-+$', mydata$V1[row])) mydata <- mydata[-row,]
}
empty_rows <- which(grepl('^\\s*$', mydata$V1))
rows_to_squeeze <- split(empty_rows, cumsum(c(1, diff(empty_rows) != 1)))
for(i in length(rows_to_squeeze):1) {
mydata$V12[rows_to_squeeze[[i]][1] - 1] <- paste(mydata$V12[seq(rows_to_squeeze[[i]][1] - 1, rows_to_squeeze[[i]][length(rows_to_squeeze[[i]])])], collapse = ' ')
mydata <- mydata[-seq(rows_to_squeeze[[i]][1], rows_to_squeeze[[i]][length(rows_to_squeeze[[i]])]),]
}
write.table(mydata, file="rolled_swiftmessage_test.csv", sep=",", col.names = FALSE, row.names = FALSE)
Original answer
Here you have my attempt at this. It's not pretty, but I think it works. Basically, I read the file as lines of text, not a table, I operate on the lines to join those that belong on the same 'message' cell, and then I put them in a nice data frame that can be saved as a csv file. Let me know if you need any other tweaks:
install.packages('stringr') ## if not installed yet
library(stringr) ## in order to use str_detect and str_split below
mydata <- readLines("rolled_swiftmessage_test.txt")
new_mydata = vector('character')
current <- 1
while(!is.na(mydata[current])) {
if(str_detect(mydata[current], '\\{')) {
i <- 1
while(!str_detect(mydata[current + i], '\\}')) {
mydata[current] <- paste(mydata[current], mydata[current + i], collapse = ' ')
i = i + 1
}
mydata[current] <- paste(mydata[current], mydata[current + i], collapse = ' ')
mydata[current] <- gsub('\\| \\| \\| \\|', '', mydata[current])
new_mydata = c(new_mydata, mydata[current])
current = current + i + 1
} else {
new_mydata = c(new_mydata, mydata[current])
current = current + 1
}
}
new_mydata <- sapply(new_mydata, function(x) str_split(x, '\\|'))
new_mydata <- as.data.frame(t(as.data.frame(new_mydata)))
write.table(new_mydata, file="rolled_swiftmessage_test.csv", sep=",", col.names = FALSE, row.names = FALSE)
The resulting image after opening the csv file (notice that I added the same row to the original text file three times just so that I would have more lines for testing):

write results sequentially in a loop in r

I have a bunt of single files which need to apply a test. I need to find the way to write automatically results of each file into a file. Here is what I do:
library(ape)
stud_files <- list.files("path/dir/data",full.names = T)
for (f in stud_files) {
df <- read.table(f, header=TRUE, sep=";")
df_xts <- as.xts(df$cola, order.by = as.Date(df$colb,"%m/%d/%Y"))
pet <- testa(df_xts)
res <- data.frame(estimate = pet$estimate,
p.value=pet$p.value,
logi = pet$alternative)
write.dna(res,file = "res_testa.xls",format = "sequential")
}
This loop works well, except the last command which aim to write the results of each file consecutively, it saved only the last performance. And the results save as string, not a table as I define above (data.frame). Any idea in this case? Thanks in advance
Check help(write.dna).
write.dna(x, file, format = "interleaved", append = FALSE,
nbcol = 6, colsep = " ", colw = 10, indent = NULL,
blocksep = 1)
append a logical, if TRUE the data are appended to the file without
erasing the data possibly existing in the file, otherwise the file (if
it exists) is overwritten (FALSE the default).
Set append = TRUE and you should be all set.
As some of the comments point out, however, you are probably better off generating your table, and then writing it all at once to a file. Unless you have billions of files, you likely won't run out of memory.
Here is how I would approach this.
library(ape)
library(data.table)
stud_files <- list.files("path/dir/data",full.names = T)
sumfunc <- function(f) {
df <- read.table(f, header=TRUE, sep=";")
df_xts <- as.xts(df$cola, order.by = as.Date(df$colb,"%m/%d/%Y"))
pet <- testa(df_xts)
res <- data.table(estimate = pet$estimate,
p.value=pet$p.value,
logi = pet$alternative)
return(res)
}
lres <- lapply(stud_files, sumfunc)
dat <- rbindlist(lres)
write.table(dat,
file = "res_testa.csv",
sep = ",",
quote = FALSE,
row.names = FALSE)

Resources