Double for loop to save several files using R - r

I am trying to do a “for loop” to generate files based on the column "group". I want to create a file for each group. My data is much bigger, but a sample would be:
id = c(1,2,3,4,5,6,7,8,9,10)
group = c(3,1,3,2,1,3,1,2,4,4)
weight = c(10,11,12,13,14,15,16,17,18,19)
index1 = c(50,50,50,50,50,50,50,50,50,50)
index2 = c(50,50,50,50,50,50,50,50,50,50)
data = data.frame(id,group,weight,index1,index2)
for (i in unique(data$group)){
for (j in 1:nrow(data)){
data$weight[j] = ifelse(data$group[j] == data$group[i], 0,data$weight[j])
data$index1[j] = ifelse(data$group[j] == data$group[i], 0,50)
data$index2[j] = ifelse(data$group[j] == data$group[i], 5,50)
}
write.table(data,paste("/home/paulaf/test/",data$group[i],".txt",sep=""),
quote=F,row.names=F,col.names=T)}
It seems to work, but it doesn’t write all the files. Any help would be very much appreciated. Thanks in advance.

Paula,
That code is actually writing four files. But you're overwriting one of those files, so you're only ending up with three.
When you name the file with paste, you're using data$group[i] to generate the name. If you look at those name by using cat() or something similar, you'll notice you have two 3.txt files.
/home/paulaf/test/3.txt
/home/paulaf/test/3.txt
/home/paulaf/test/1.txt
/home/paulaf/test/2.txt
So, that's why your not getting all of you files. Your first 3.txt is overwritten.
Looking a bit more closely at your data object, you can see why this happened.
Your i in your loops is going to have the values 3, 1, 2, and 4. By plugging 1-4 into data$group[i], you're actually pulling out the value of the 1-4th rows in the data$group. Notice that the first and third rows are both group 3.
id group weight index1 index2
1 1 3 0 50 50
2 2 1 0 50 50
3 3 3 0 50 50
4 4 2 0 0 5
5 5 1 0 50 50
6 6 3 0 50 50
7 7 1 0 50 50
8 8 2 0 0 5
9 9 4 18 50 50
10 10 4 19 50 50
Maybe replace your write.table() with this:
write.table(data,paste("/home/paulaf/test/",i,".txt",sep=""),
quote=F,row.names=F,col.names=T)
And one other note to save you future headache: It's often helpful to print some of your variables to the console. It's just a way to get some insight into what's happening.
Also, good luck, keep working with R, you're doing great!

unique(data$group) is a vector of length 4. data$group has a length of 10. You're setting the filenames to the first 4 values of data$group instead of the unique values of data$group.
Try replacing data$group[i] with just i inside the paste that generates the filename, e.g.
for (i in unique(data$group)){
for (j in 1:nrow(data)){
data$weight[j] = ifelse(data$group[j] == data$group[i], 0,data$weight[j])
data$index1[j] = ifelse(data$group[j] == data$group[i], 0,50)
data$index2[j] = ifelse(data$group[j] == data$group[i], 5,50)
}
fileName = paste("/home/paulaf/test/",i,".txt",sep="")
write.table(data,fileName,quote=F,row.names=F,col.names=T)
}

Your problem is very simple. Inside your write.table function, you're pasting the name using data$group[i], but your outside loop is not looping over the indices of the unique groups, but over the group names themselves. Your is are 3 1 2 4, so calling data$group[i] for each of those will result in 3, 3, 1, 2, which means all the filenames are all wrong (one file is replaced and you end up with only 3, for this sample). The solution is then:
write.table(data,paste("/home/paulaf/test/",i,".txt",sep=""),
quote=F,row.names=F,col.names=T)}
It's also slightly more efficiently (and easier to read, imho) to use paste0, so:
write.table(data,paste0("/home/paulaf/test/",i,".txt"),
quote=F,row.names=F,col.names=T)}

Related

Looping through a column to make a new table in R

I want to make a table called Count_Table and in it, Id like to count the number of 0s, 1s, and 5s when column "num" == 1,2,3,4, etc.
For example, the code below will count the 0s,1s,and 5s in column "num" when "num == "1". This is great but i need to do this 34 more times since "num" goes from 1-35.
Count_Table <- table(SASS_data[num == "1"]$Visited5)
I am new to R and I don't know how to add 1 to the "num" and loop it until 35 so that the Count_Table includes the counts of 0,1,5 for all nums that exist (1-35). I am sorry if this is confusing and thank you for your help.
lapply will generate a list of tables that span the columns of a dataframe. E.g.,
tablist <- lapply(mtcars, table)
If your dataframe contains columns you want to exclude, can do that by restricting the dataframe. E.g.,
tablist2 <- lapply(mtcars[, c(2, 4, 7)], table)
Answer
Table works on multiple dimensions. Just put both num and Visited5 as arguments. This also works if not all unique values of Visited5 are present in every level of num, those cells will simply be set to 0.
Example
SASS_data <- data.frame(
num = rep(1:5, each = 5),
Visited5 = sample(1:3, 25, r = T)
)
table(SASS_data$num, SASS_data$Visited5)
# 1 2 3
# 1 2 1 2
# 2 1 3 1
# 3 1 1 3
# 4 2 0 3
# 5 2 2 1

Check and replace column values in R dataframe

I have multiple files to read in using R. I iterate through the files in a loop, obtain dataframes and then try to change values of a particular column. Examples of the R dataframes are as follows:
df_A:
ID ZN
1 0
2 1
3 1
4 0
df_B:
ID ZN
1 2
2 1
3 1
4 2
As shown above, the column 'ZN' for some dataaframes may have 0's and 1's and others dataframes have have 1's and 2's. What I want is - as I'm iterating through the files, I want to make changes only in the dataframes with column ZN having 1's and 2's like this: 1 to 0 and 2 to 1. Dataframes with ZN values as 0's and 1's will be left unchaged.
my attempt did not work:
if(dataframe$ZN > 1){
dataframe$ZN<-recode(dataframe$ZN,"1=0;2=1")
}
else{
dataframe$ZN
}
Any solutions please?
One approach might be to decrement the value of ZN by one if we detect a single value of 2 anywhere in the column:
if (max(df_A$ZN) == 2) {
df_A$ZN = df_A$ZN - 1
}
Demo
If there are only two values i.e. 0 and 1, then
df_A$ZN <- (df_A$ZN==0) + 1
df_A$ZN
#[1] 2 1 1 2
Or using case_when for multiple values
library(dplyr)
df_A %>%
mutate(ZN = case_when(ZN==0 ~2, TRUE ~ 1))

R enumerate duplicates in a dataframe with unique value

I have a dataframe containing a set of parts and test results. The parts are tested on 3 sites (North Centre and South). Sometimes those parts are re-tested. I want to eventually create some charts that compare the results from the first time that a part was tested with the second (or third, etc.) time that it was tested, e.g. to look at tester repeatability.
As an example, I've come up with the below code. I've explicitly removed the "Experiment" column from the morley data set, as this is the column I'm effectively trying to recreate. The code works, however it seems that there must be a more elegant way to approach this problem. Any thoughts?
Edit - I realise that the example given was overly simplistic for my actual needs (I was trying to generate a reproducible example as easily as possible).
New example:
part<-as.factor(c("A","A","A","B","B","B","A","A","A","C","C","C"))
site<-as.factor(c("N","C","S","C","N","S","N","C","S","N","S","C"))
result<-c(17,20,25,51,50,49,43,45,47,52,51,56)
data<-data.frame(part,site,result)
data$index<-1
repeat {
if(!anyDuplicated(data[,c("part","site","index")]))
{ break }
data$index<-ifelse(duplicated(data[,1:2]),data$index+1,data$index)
}
data
part site result index
1 A N 17 1
2 A C 20 1
3 A S 25 1
4 B C 51 1
5 B N 50 1
6 B S 49 1
7 A N 43 2
8 A C 45 2
9 A S 47 2
10 C N 52 1
11 C S 51 1
12 C C 56 1
Old example:
#Generate a trial data frame from the morley dataset
df<-morley[,c(2,3)]
#Set up an iterative variable
#Create the index column and initialise to 1
df$index<-1
# Loop through the dataframe looking for duplicate pairs of
# Runs and Indices and increment the index if it's a duplicate
repeat {
if(!anyDuplicated(df[,c(1,3)]))
{ break }
df$index<-ifelse(duplicated(df[,c(1,3)]),df$index+1,df$index)
}
# Check - The below vector should all be true
df$index==morley$Expt
We may use diff and cumsum on the 'Run' column to get the expected output. In this method, we are not creating a column of 1s i.e 'index' and also assuming that the sequence in 'Run' is ordered as showed in the OP's example.
indx <- cumsum(c(TRUE,diff(df$Run)<0))
identical(indx, morley$Expt)
#[1] TRUE
Or we can use ave
indx2 <- with(df, ave(Run, Run, FUN=seq_along))
identical(indx2, morley$Expt)
#[1] TRUE
Update
Using the new example
with(data, ave(seq_along(part), part, site, FUN=seq_along))
#[1] 1 1 1 1 1 1 2 2 2 1 1 1
Or we can use getanID from library(splitstackshape)
library(splitstackshape)
getanID(data, c('part', 'site'))[]
I think this is a job for make.unique, with some manipulation.
index <- 1L + as.integer(sub("\\d+(\\.)?","",make.unique(as.character(morley$Run))))
index <- ifelse(is.na(index),1L,index)
identical(index,morley$Expt)
[1] TRUE
Details of your actual data.frame may matter. However, a couple of options working with your example:
#this works if each group starts with 1:
df$index<-cumsum(df$Run==1)
#this is maybe more general, with data.table
require(data.table)
dt<-as.data.table(df)
dt[,index:=seq_along(Speed),by=Run]

How to access data in R from read.table

I got this text file:
a b c d
0 2 8 9
2 0 3 4
8 3 0 2
9 4 2 0
I put this command in R:
k<-read.table("d:/r/file.txt", header=TRUE)
now I want to access the value in row 3 , column 4 (which is 2) ... how can I access it?
Basically my question is how to access table data one by one? I want to use all data separately in nested for loops.
Like:
for(row=0;row<4;row++)
for(col=0;col<4;col++)
print data[row][col];
You may want to apply a certain operation on each element of matrix.
This is how you could do it, an example
A <- matrix(1:16,4,4)
apply(A,c(1,2),function(x) {x %% 5})
And operation on the whole row
apply(A,1,function(x) sum(x^2))
Is this what you want? :
test <- read.table("test.txt", header = T, fill = T)
for(i in 1:nrow(test)){
for(j in 1:ncol(test)) {
print(test[i,j])
}
}

help with rle command

I'm having some trouble with an rle command that is designed to find the point at which participants reach 8 contiguous ones in a row.
For example, if:
x <- c(0,1,0,1,1,1,1,1,1,1,1,1)
i want to return a value of 11.
Thanks to DWin to I've been using this piece of code:
which( rle(x2)$values==1 & rle(x2)$lengths >= 8)
sum(rle(x)$lengths[ 1:(min(which(rle(x)$lengths >= 8))-1) ]) + 8
I've been using this code successfully to process my data. However, i noticed that it made a mistake when processing one of my data files.
For example, if
x <- c(1,1,1,1,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
the code returns 19, which is the point at which eight contiguous zeros in a row is reached. i'm not sure what is going wrong or how it fix it.
thanks in advance for your help.
Will
You need to paste the first line of code in its entirety into the second:
sum(rle(x)$lengths[ 1:(min(which( rle(x2)$values==1 & rle(x2)$lengths >= 8))-1) ]) + 8
[1] 39
However, here is another approach, using the function filter. This yields the same result in what I consider to be much more readable code:
which(filter(x2, rep(1/8, 8), sides=1) == 1)[1]
[1] 39
The filter function when used in this way essentially computes a moving average over a block of 8 values in the vector. I then return the position of the first value where the moving average equals 1.
In the basic programming course I teach, I advise students to give proper names to subresults, and to inspect these subresults:
lengthOfrepeatsOfAnything<-rle(x)$lengths
#4 2 5 11 2 2 3 2 17
whichRepeatsAreOfOnes<-rle(x)$values==1
#1 3 5 7 9
repeatsOfOnesLength<-lengthOfrepeatsOfAnything * whichRepeatsAreOfOnes #TRUE = 1, FALSE=0
#4 0 5 0 2 0 3 0 17
whichRepeatOfOneAreLongerThanEight<-which(repeatsOfOnesLength >= 8)
#9
result<-NA
if(length(whichRepeatOfOneAreLongerThanEight)>0){
firstRepeatOfOneAreLongerThanEight<-whichRepeatOfOneAreLongerThanEight[1]
#9
if(firstRepeatOfOneAreLongerThanEight==1){
result<-8
}
else{
repeatsBeforeFirstEightOnes<-1:(firstRepeatOfOneAreLongerThanEight-1)
#1 2 3 4 5 6 7 8
lengthsOfRepeatsBeforeFirstEightOnes<-lengthOfrepeatsOfAnything[repeatsBeforeFirstEightOnes]
#4 2 5 11 2 2 3 2
result<-sum(lengthsOfRepeatsBeforeFirstEightOnes) + 8
}
}
I know it doesn't look as dandy as a oneline solution, but it helps to make things clear and to pick up errors... Besides: what if you look back at this code in 4 months? Which one will be easier to understand again?
My advice would be to break the code up into simpler pieces. As suggested by #Nick, you want to write code which can be easily debugged and modular coding allows you to do that.
# find runs of 0s and 1s
run_01 = rle(x)
# find run of 1's with length >=8
run_1 = with(run_01, which(values == 1 & lengths >=8))
# find starting position of run_1
start_pos = sum(run_01$lengths[1:(run_1 - 1)])
# add 8 to it
end_pos = start_pos + 8

Resources