How to access data in R from read.table - r

I got this text file:
a b c d
0 2 8 9
2 0 3 4
8 3 0 2
9 4 2 0
I put this command in R:
k<-read.table("d:/r/file.txt", header=TRUE)
now I want to access the value in row 3 , column 4 (which is 2) ... how can I access it?
Basically my question is how to access table data one by one? I want to use all data separately in nested for loops.
Like:
for(row=0;row<4;row++)
for(col=0;col<4;col++)
print data[row][col];

You may want to apply a certain operation on each element of matrix.
This is how you could do it, an example
A <- matrix(1:16,4,4)
apply(A,c(1,2),function(x) {x %% 5})
And operation on the whole row
apply(A,1,function(x) sum(x^2))

Is this what you want? :
test <- read.table("test.txt", header = T, fill = T)
for(i in 1:nrow(test)){
for(j in 1:ncol(test)) {
print(test[i,j])
}
}

Related

Mutate a value at a specific position in a column (Index) depending on a condition in R

I am trying to mutate() a 0 or 1 at a specific position in a column. Normally mutate() just mutates the whole column but I want to check conditions and then place a value at a specific position. I tried to use something like an index. Hear is an example: I have values and want to compare them one by one. compare 10 to 16, 16 to 9 and so on. The criteria is: Are value 1 and 2 either both in a or not in a, or is one in a and the other value is not. I wrote down an approach but it seems like mutate does not allow to use TaskS[i+1].
Thanks for your help!
Index
Values
TaskS
1
10
2
16
1
3
9
1
4
8
0
a <- c(1:10)
data_time_filter <- mutate(data_time_filter, TaskS = '')
for (i in 1:40){
current <- data_time_filter$Trial_Id[i] %in% a
adjacent <- data_time_filter$Trial_Id[i+1] %in% a
if (current == adjacent){
data_time_filter <- mutate(data_time_filter, TaskS[i+1] = 0)
}
else if (current != adjacent){
data_time_filter <- mutate(data_time_filter, TaskS[i+1] = 1)
}
}
I am not really sure if I understand your question correctly but I will try to help anyway.
In my approach I have used a user made function in combination with sapply. I believe to work mutate correctly you need an vector output which you won't get with a loop.
So, here is what I did:
# Recreate df
data_time_filter <- data.frame(
index = 1:4,
Values = c(10, 16, 9, 8)
)
# Create filter
ff <- c(1:10)
# Add empty TakS column
data_time_filter <- data_time_filter %>%
mutate(TaskS = '')
# Define a function
abc <- function(data, filter){
l <- length(data)
sapply(1:l, function(x){
if(x == 1){
""
} else {
current <- data[x-1] %in% filter
adjacent <- data[x] %in% filter
if(current == adjacent){
0
} else {
1
}
}
})
}
This approach will let you use mutate:
> data_time_filter
index Values TaskS
1 1 10
2 2 16
3 3 9
4 4 8
> data_time_filter %>%
mutate(TaskS = abc(Values, ff))
index Values TaskS
1 1 10
2 2 16 1
3 3 9 1
4 4 8 0
You could even skip making placeholder TaskS column and create a new one:
> data_time_filter %>%
mutate(TskS_new = abc(Values, ff))
index Values TaskS TskS_new
1 1 10
2 2 16 1
3 3 9 1
4 4 8 0

R function similar to Excel's match?

Good day
1) Is there a R function similar to Excel's match function?
2) I've made my own as below(lengthy..TT)
Could anybody suggest things need to be improved? Or other way?
fmatch2<-function(ss1, ss2) { #ss1 correspond the first argument of Excel match function. ss2 for the second.
fmatch<-function(ii,ss) { # return location in ss where ii match.
if (length(which(ss==ii))>0 ) {
rr<- min(which(ss==ii))
} else {
if (length(which(ss>ii))>0)
{rr<-min(which(ss>ii))-1 }
}
return(rr)
}
rr<-list()
n<-1
for ( x in ss1 ) { # apply fmatch to each member in ss1
nn<-fmatch(x,ss2[1:n])
rr<-rbind(rr,nn)
n<-n+1
}
as.vector(unlist(rr[,1]))
}
Usages of the function fmatch2 as below.
Mimicking Excel "=MATCH(H1,$I$1:1,1)". Element name of the list below "ch, ci" correspond to column H, Column I. The result is the list named cn.
x<-data.frame(cf=c(0,1,2,3,4,5),ch=c(0,0,3,6,6,6),ci=c(0,0,3,7,11,13))
y<-data.frame(cf=c(0,1,2,3,4,5),ch=c(0,0,3,6,6,6),ci=c(0,0,3,7,11,13),cn=fmatch2(x[[2]],x[[3]]))
Ofcourse i am not entirely sure what you're trying to do, as i'd expect your fmatch2 function to return NA for ch==6 (because 6 is not present in ci), but i love doing things using dplyr:
library(dplyr)
result <- x %>% # "%>%" means "and then"
mutate(chInCi = match(ch, x$ci)) #adds a column named "chInCi" with the position in ci of the first match of the value in ch
result
cf ch ci chInCi
1 0 0 0 1
2 1 0 0 1
3 2 3 3 3
4 3 6 7 NA
5 4 6 11 NA
6 5 6 13 NA

retrieving a list element in O(1) in R

suppose I have the following:
a <- vector('list',50)
for(i in 1:50)
{
a[[i]] <- list(path=paste0("file",sample(0:600,1)),contents=sample(1:5,10*i,replace=TRUE))
}
Now, for example; I want to retrieve the contents of file45(assuming it exists in this randomly generated data) as fast as possible.
I have tried the following:
contents <- unlist(Filter(function(x) x$path=="file45",a),recursive=FALSE)$contents
However, the list searching overhead makes reading from memory even slower than reading directly from disk (to some extent).
Is there any other way of retrieving the contents in something reasonably faster than reading from disk ideally O(1) ?
edit: assume that there are no duplicate filepaths in my sublists and that there are largely more than 50 sublists
Use the names attribute to track the items instead:
a <- vector('list',50)
for(i in 1:50)
{
a[[i]] <- list(contents=sample(1:5,10*i,replace=TRUE))
}
names(a) <- paste0("file",sample(1:600,50))
a[["file45"]]
NULL
a[["file25"]]
$contents
[1] 3 1 3 1 2 5 1 5 1 2 3 1 4 1 1 4 1 5 1 5 1 4 5 2 5 2 2 5 1 1
Try the following:
a[sapply(a, function(x) x$path == "file45")][[1]]$contents

Double for loop to save several files using R

I am trying to do a “for loop” to generate files based on the column "group". I want to create a file for each group. My data is much bigger, but a sample would be:
id = c(1,2,3,4,5,6,7,8,9,10)
group = c(3,1,3,2,1,3,1,2,4,4)
weight = c(10,11,12,13,14,15,16,17,18,19)
index1 = c(50,50,50,50,50,50,50,50,50,50)
index2 = c(50,50,50,50,50,50,50,50,50,50)
data = data.frame(id,group,weight,index1,index2)
for (i in unique(data$group)){
for (j in 1:nrow(data)){
data$weight[j] = ifelse(data$group[j] == data$group[i], 0,data$weight[j])
data$index1[j] = ifelse(data$group[j] == data$group[i], 0,50)
data$index2[j] = ifelse(data$group[j] == data$group[i], 5,50)
}
write.table(data,paste("/home/paulaf/test/",data$group[i],".txt",sep=""),
quote=F,row.names=F,col.names=T)}
It seems to work, but it doesn’t write all the files. Any help would be very much appreciated. Thanks in advance.
Paula,
That code is actually writing four files. But you're overwriting one of those files, so you're only ending up with three.
When you name the file with paste, you're using data$group[i] to generate the name. If you look at those name by using cat() or something similar, you'll notice you have two 3.txt files.
/home/paulaf/test/3.txt
/home/paulaf/test/3.txt
/home/paulaf/test/1.txt
/home/paulaf/test/2.txt
So, that's why your not getting all of you files. Your first 3.txt is overwritten.
Looking a bit more closely at your data object, you can see why this happened.
Your i in your loops is going to have the values 3, 1, 2, and 4. By plugging 1-4 into data$group[i], you're actually pulling out the value of the 1-4th rows in the data$group. Notice that the first and third rows are both group 3.
id group weight index1 index2
1 1 3 0 50 50
2 2 1 0 50 50
3 3 3 0 50 50
4 4 2 0 0 5
5 5 1 0 50 50
6 6 3 0 50 50
7 7 1 0 50 50
8 8 2 0 0 5
9 9 4 18 50 50
10 10 4 19 50 50
Maybe replace your write.table() with this:
write.table(data,paste("/home/paulaf/test/",i,".txt",sep=""),
quote=F,row.names=F,col.names=T)
And one other note to save you future headache: It's often helpful to print some of your variables to the console. It's just a way to get some insight into what's happening.
Also, good luck, keep working with R, you're doing great!
unique(data$group) is a vector of length 4. data$group has a length of 10. You're setting the filenames to the first 4 values of data$group instead of the unique values of data$group.
Try replacing data$group[i] with just i inside the paste that generates the filename, e.g.
for (i in unique(data$group)){
for (j in 1:nrow(data)){
data$weight[j] = ifelse(data$group[j] == data$group[i], 0,data$weight[j])
data$index1[j] = ifelse(data$group[j] == data$group[i], 0,50)
data$index2[j] = ifelse(data$group[j] == data$group[i], 5,50)
}
fileName = paste("/home/paulaf/test/",i,".txt",sep="")
write.table(data,fileName,quote=F,row.names=F,col.names=T)
}
Your problem is very simple. Inside your write.table function, you're pasting the name using data$group[i], but your outside loop is not looping over the indices of the unique groups, but over the group names themselves. Your is are 3 1 2 4, so calling data$group[i] for each of those will result in 3, 3, 1, 2, which means all the filenames are all wrong (one file is replaced and you end up with only 3, for this sample). The solution is then:
write.table(data,paste("/home/paulaf/test/",i,".txt",sep=""),
quote=F,row.names=F,col.names=T)}
It's also slightly more efficiently (and easier to read, imho) to use paste0, so:
write.table(data,paste0("/home/paulaf/test/",i,".txt"),
quote=F,row.names=F,col.names=T)}

How to assign number of repeats to dataframe based on elements of an identifying vector in R?

I have a dataframe with individuals assigned a text id that concatenates a place-name with a personal id (see data, below). Ultimately, I need to do a transformation of the data set from "long" to "wide" (e.g., using "reshape") so that each individual comprises one row, only. In order to do that, I need to assign a "time" variable that reshape can use to identify time-varying covariates, etc. I have (probably bad) code to do this for individuals that repeat up to two times, but need to be able to identify up to 18 repeated occurrences. The code below works fine if I remove the line preceded by the hash, but only identifies up to two repeats. If I leave that line in (which would seem necessary for individuals repeated more than twice), R chokes, giving the following error (presumably because the first individual is repeated only twice):
Error in if (data$uid[i] == data$uid[i - 2]) { :
argument is of length zero
Can anyone help with this? Thanks in advance!
place <- rep("ny",10)
pid <- c(1,1,2,2,2,3,4,4,5,5)
uid<- paste(place,pid,sep="")
time <- rep(0,10)
data <- cbind(uid,time)
data <- as.data.frame(data)
data$time <- as.numeric(data$time)
#bad code
data$time[1] <- 1 #need to set first so that loop doesn't go to a row that doesn't exist (i.e., row 0)
for (i in 2:NROW(data)){
data$time[i] <- 1 #set first occurrence to 1
if (data$uid[i] == data$uid[i-1]) {data$time[i] <- 2} #set second occurrence to 2, etc.
#if (data$uid[i] == data$uid[i-2]) {data$time[i] <- 3}
i <- i+1
}
It's unclear what you are trying to do, but I think you're saying that you need to create a time index for each row by every unique uid. Is that right?
If so, give this a whirl
library(plyr)
ddply(data, "uid", transform, time = seq_along(uid))
Will give you something like:
uid time
1 ny1 1
2 ny1 2
3 ny2 1
4 ny2 2
5 ny2 3
....
Is this what you have in mind?
> d <- data.frame(uid = paste("ny",c(1,2,1,2,2,3,4,4,5,5),sep=""))
> out <- do.call(rbind, lapply(split(d, d$uid), function(x) {x$time <- 1:nrow(x); x}))
> rownames(out) <- NULL
> out
uid time
1 ny1 1
2 ny1 2
3 ny2 1
4 ny2 2
5 ny2 3
6 ny3 1
7 ny4 1
8 ny4 2
9 ny5 1
10 ny5 2
Using your data frame setup:
place <- rep("ny",10)
pid <- c(1,1,2,2,2,3,4,4,5,5)
uid<- paste(place,pid,sep="")
time <- rep(0,10)
data <- cbind(uid,time)
data <- as.data.frame(data)
You can use:
data$time <- sequence(table(data$uid))
data
To get:
> data
uid time
1 ny1 1
2 ny1 2
3 ny2 1
4 ny2 2
5 ny2 3
6 ny3 1
7 ny4 1
8 ny4 2
9 ny5 1
10 ny5 2
NOTE: Your data.frame MUST be sorted by uid first for this to work.
After trying the above solutions on large data sets, I decided to write my own loop for this. It was very time-consuming and still required the data to be broken into 50k-element vectors, but it did work in the end:
system.time( for(i in 2:length(data$uid)) {
if(data$uid[i]==data$uid[i-1]) data$repeats[i] <- data$repeats[i-1]+1
if ((i %% 1000)== 0) { #helps to keep track of how far the loop has gotten
print(i) }
i+1
}
)
Thanks to all for your help.

Resources