So I have data like this
Date DJIA Time
1 1/1/96 5117.12 1
2 1/2/96 5177.45 2
3 1/3/96 5194.07 3
4 1/4/96 5173.84 4
5 1/5/96 5181.43 5
6 1/8/96 5197.68 6
I want to decrement the values in the Time column by 1 and remove the first row.
I've achieved both of these steps separately-
data[-1,]
removes the first row, while
data$Time - 1
decrements, but returns me the decremented columns.
How do I make it so that I get something like this
Date DJIA Time
1 1/2/96 5177.45 1
2 1/3/96 5194.07 2
3 1/4/96 5173.84 3
4 1/5/96 5181.43 4
5 1/8/96 5197.68 5
?
I've also tried
data[-1,]$Time - 1
but this again returns me only the time vector decremented by 1, as opposed to changing the entire data frame.
This you got right:
data[-1,]
data$Time - 1
But, as you said, it returns a new data frame; it doesn't change what you already have. So you just need to assign the result back to data:
data <- data[-1,]
data$Time <- data$Time - 1
To better understand, you can do newData <- data[-1,] to create a new data frame without the first row. If you want to transform your original data frame, you need to re-assign it data <- .... Same goes for columns or rows, you need to do data$column <- ....
Related
I am trying to convert the data which I have in txt file:
4.0945725440979;4.07999897003174;4.0686674118042;4.05960083007813;4.05218315124512;...
to a column (table) where the values are separated by tab.
4.0945725440979
4.07999897003174
4.0686674118042...
So far I tried
mydata <- read.table("1.txt", header = FALSE)
separate_data<- strsplit(as.character(mydata), ";")
But it does not work. separate_data in this case consist only of 1 element:
[[1]]
[1] "1"
Based on the OP, it's not directly stated whether the raw data file contains multiple observations of a single variable, or should be broken into n-tuples. Since the OP does state that read.table results in a single row where s/he expects it to contain multiple rows, we can conclude that the correct technique is to use scan(), not read.table().
If the data in the raw data file represents a single variable, then the solution posted in comments by #docendo works without additional effort. Otherwise, additional work is required to tidy the data.
Here is an approach using scan() that reads the file into a vector, and breaks it into observations containing 5 variables.
rawData <- "4.0945725440979;4.07999897003174;4.0686674118042;4.05960083007813;4.05218315124512;4.0945725440979;4.07999897003174;4.0686674118042;4.05960083007813;4.05218315124512"
value <- scan(textConnection(rawData),sep=";")
columns <- 5 # set desired # of columns
observations <- length(aVector) / columns
observation <- unlist(lapply(1:observations,function(x) rep(x,times=columns)))
variable <- rep(1:columns,times=observations)
data.frame(observation,variable,value)
...and the output:
> data.frame(observation,variable,value)
observation variable value
1 1 1 4.094573
2 1 2 4.079999
3 1 3 4.068667
4 1 4 4.059601
5 1 5 4.052183
6 2 1 4.094573
7 2 2 4.079999
8 2 3 4.068667
9 2 4 4.059601
10 2 5 4.052183
>
At this point the data can be converted into a wide format tidy data set with reshape2::dcast().
Note that this solution requires that the number of data values in the raw data file is evenly divisible by the number of variables.
I'm fairly new to R and I have a question I'd like to ask you guys!
I have a large data frame with TimeStamps from an eyetracking experiment.
Blockquote ParticipantName RecordingTimestamp GazeEventType GazeEventDuration AOI[Prob]Hit AOI[Prob 2]Hit
1 1 -1255 NA NA NA
2 1 -1252 Fixation 933 NA NA
3 1 -1249 Fixation 933 NA NA
4 1 -1245 Fixation 933 NA NA
5 1 -1242 Fixation 933 NA NA
6 1 -1239 Fixation 933 NA NA
I have in another data frame the Event triggers. I've manipulated them and now I have two columns, one with the start time and the other with the finish time.
Blockquote ParticipantName TimeStamp Event EventFinish
1 1 6593 10 4593
2 1 27235 2 25235
3 1 27392 10 25392
4 1 47278 1 45278
5 1 47440 10 45440
6 1 71857 2 69857
Where Timestamp is the end of the event and EventFinish is the start (made a confusion with the names there, sorry!)
So, that first data frame has around 140.000 rows and starts at a timestamp of -1255 and goes up to 455251.
I need to get the values from the second data frame and make a subset or filter given that the RecordingTimestamp values of the first data frame is higher than the EventFinish and lower than the Timestamp columns of the second data frame. For each of the 24 rows in the secon data frame.
That way I'd eliminate the rows that are not my epoched data.
I've looke around loops, filters and subsets and I was not able to get it right.
I've tried this code:
for(i in 1:EventPP1$TimeStamp){
for (j in 1:EventPP1$EventFinish){
test3<- subset(Participant1, Participant1$RecordingTimestamp>=EventPP1$EventFinish[j] & Participant1$RecordingTimestamp<=EventPP1$TimeStamp[i] )
}
}
But R just goes on and on and never gets to a final answer.
I've also tried this code:
uniq1<- unique(unlist(EventPP1$TimeStamp))
uniq1<- as.data.frame(uniq1)
uniq2<- unique(unlist(EventPP1$EventFinish))
uniq2<- as.data.frame(uniq2)
for(i in 1:seq_along(uniq1)){
for (j in 1:seq_along(uniq2)){
test3<- filter(Participant1, Participant1$RecordingTimestamp>=uniq2[j] & Participant1$RecordingTimestamp<=uniq1[i] )
}
}
But it only gets my the final pair, not the rest.
Can anyone help me please? That's data only from one participant and I have about 80 more. I think the loop might be the more suitable method.
If anyone know how to solve this I'd appreciate it!
Thanks!
Roberto
You have to loop only once, and not over the timestamps but over the lines in your second data frame. For every line you have a value for begin and one for end, and you can flag all data in the first data frame that fall within these limits. If I call your data frames p (for Participant) and E (for Events), and give the columns short names:
for (i in 1:length(E$ts)){ # note you loop over the 24 lines only
p$flag[p$ts>=E$start[i] & p$ts <= E$fin[i]] <-1 # set a flag to 1 for good data
}
p<-p[p$flag==1] # restrict your data frame to flagged data
I am trying to run a cumsum on a data frame on two separate columns. They are essentially tabulation of events for two different variables. Only one variable can have an event recorded per row in the data frame. The way I attacked the problem was to create a new variable, holding the value ‘1’, and create two new columns to sum the variables totals. This works fine, and I can get the correct total amount of occurrences, but the problem I am having is that in my current ifelse statement, if the event recorded is for variable “A”, then variable “B” is assigned 0. But, for every row, I want to have the previous variable’s value assigned to the current row, so that I don’t end up with gaps where it goes from 1 to 2, to 0, to 3.
I don't want to run summarize on this either, I would prefer to keep each recorded instance and run new columns through mutate.
CURRENT DF:
Event Value Variable Total.A Total.B
1 1 A 1 0
2 1 A 2 0
3 1 B 0 1
4 1 A 3 0
DESIRED RESULT:
Event Value Variable Total.A Total.B
1 1 A 1 0
2 1 A 2 0
3 1 B 2 1
4 1 A 3 1
Thanks!
You can use the property of booleans that you can sum them as ones and zeroes. Therefore, you can use the cumsum-function:
DF$Total.A <- cumsum(DF$variable=="A")
Or as a more general approach, provided by #Frank you can do:
uv = unique(as.character(DF$Variable))
DF[, paste0("Total.",uv)] <- lapply(uv, function(x) cumsum(DF$V == x))
If you have many levels to your factor, you can get this in one line by dummy coding and then cumsuming the matrix.
X <- model.matrix(~Variable+0, DF)
apply(X, 2, cumsum)
I would like to find matched elements in a second column with the first column of a data frame ,and create a trigrams using the matched element as the middle element of the trigram. In case of no match, the middle and last element of the trigram will be the unmatched second-column element. Here is an example:
gdf <- data.frame(from=c(1,2,3,4,5),to=c(2,3,1,5,6),stringsAsFactors=FALSE)
gdf
# from to
# 1 2
# 2 3
# 3 1
# 4 5
# 5 6
The output trigrams are as follow:
from middle to
1 2 3
2 3 1
3 1 2
4 5 6
5 6 6
My code with for loop takes a long time to process my huge data set.my data set has 54304 rows.
This is what I wrote:
num <- nrow(gdf)
df2 <- data.frame(from=character(0),middle=character(0),to=character(0),stringsAsFactors=FALSE)
count <- rep(0,nrow(gdf))
for(row in 1:nrow(gdf)){
for(rowc in 1:nrow(gdf)){
if(gdf[rowc,]$from==gdf[row,]$to){
df2[nrow(df2)+1,]<-c(gdf[row,]$from,gdf[row,]$to,gdf[rowc,]$to)
count[row]<-row
}
}
if(count[row]==0){
df2[nrow(df2)+1,]<-c(gdf[row,]$from,gdf[row,]$to,gdf[row,]$to)
}
}
Any help would be greatly appreciated!
Not sure if your example is too simple for this to work in the real data set, but a simple merge works for the example and then I sort the columns to get them back in order since a merge places the column that you merge by as column 1.
Merged <- merge(gdf,gdf,by.x="to",by.y="from")[,c(2,1,3)]
Then you can add in the nomatch elements later using a row bind
rbind(Merged,gdf[! paste(gdf[,1],gdf[,2]) %in% paste(Merged[,1],Merged[,2]),][,c(1,2,2)])
This seems to be basic, but I wont get it. I am trying to compute the frequency table in R for the data as below
1 2
2 1
3 1
I want to transport the the two way frequencies in csv output, whose rows will be all the unique entries in column A of the data and whose columns will be all the unique entries in column B of the data, and the cell values will be the number of times the values have occurred. I have explored some constructs like table but I am not able to output the values correctly in csv format.
Output of sample data:
"","1","2"
"1",0,1
"2",1,0
"3",1,0
The data:
df <- read.table(text = "1 2
2 1
3 1")
Calculate frequencies using table:
(If your object is a matrix, you could convert it to a data frame using as.data.frame before using table.)
tab <- table(df)
V2
V1 1 2
1 0 1
2 1 0
3 1 0
Write data with the function write.csv:
write.csv(tab, "tab.csv")
The resulting file:
"","1","2"
"1",0,1
"2",1,0
"3",1,0