making trigrams from a dataframe in R - r

I would like to find matched elements in a second column with the first column of a data frame ,and create a trigrams using the matched element as the middle element of the trigram. In case of no match, the middle and last element of the trigram will be the unmatched second-column element. Here is an example:
gdf <- data.frame(from=c(1,2,3,4,5),to=c(2,3,1,5,6),stringsAsFactors=FALSE)
gdf
# from to
# 1 2
# 2 3
# 3 1
# 4 5
# 5 6
The output trigrams are as follow:
from middle to
1 2 3
2 3 1
3 1 2
4 5 6
5 6 6
My code with for loop takes a long time to process my huge data set.my data set has 54304 rows.
This is what I wrote:
num <- nrow(gdf)
df2 <- data.frame(from=character(0),middle=character(0),to=character(0),stringsAsFactors=FALSE)
count <- rep(0,nrow(gdf))
for(row in 1:nrow(gdf)){
for(rowc in 1:nrow(gdf)){
if(gdf[rowc,]$from==gdf[row,]$to){
df2[nrow(df2)+1,]<-c(gdf[row,]$from,gdf[row,]$to,gdf[rowc,]$to)
count[row]<-row
}
}
if(count[row]==0){
df2[nrow(df2)+1,]<-c(gdf[row,]$from,gdf[row,]$to,gdf[row,]$to)
}
}
Any help would be greatly appreciated!

Not sure if your example is too simple for this to work in the real data set, but a simple merge works for the example and then I sort the columns to get them back in order since a merge places the column that you merge by as column 1.
Merged <- merge(gdf,gdf,by.x="to",by.y="from")[,c(2,1,3)]
Then you can add in the nomatch elements later using a row bind
rbind(Merged,gdf[! paste(gdf[,1],gdf[,2]) %in% paste(Merged[,1],Merged[,2]),][,c(1,2,2)])

Related

Convert the rows of frequency "table" (NOT matrix or dataframe) to separate lists [duplicate]

This question already has answers here:
How to convert a table to a data frame
(5 answers)
Closed last month.
I'm running frequency table of frequencies, I want to convert the table to two lists of numbers.
numbers <- c(1,2,3,4,1,2,3,1,2,3,1,2,3,4,2,3,5,1,2,3,4)
freq_of_freq <- table(table(numbers))
> freq_of_freq
1 3 5 6
1 1 1 2
From the table freq_of_freq, I'd like to get create two list, x and y, one containing the numbers 1,3,5,6 and the other with the frequency values 1,1,1,2
I tried this x <- freq_of_freq[ 1 , ] and y <- freq_of_freq[ 2 , ], but this doesn't work.
Any help greatly appreciated. Thanks
One approach is to use stack() to create a list.
numbers <- c(1,2,3,4,1,2,3,1,2,3,1,2,3,4,2,3,5,1,2,3,4)
freq_of_freq <- table(table(numbers))
stack(freq_of_freq)
#> values ind
#> 1 1 1
#> 2 1 3
#> 3 1 5
#> 4 2 6
To exactly match your expected output, you could do:
x = as.integer(names(freq_of_freq))
y = unname(freq_of_freq)
Note, the OP attempt of freq_of_freq[1, ] does not work because table returns a named integer vector for this example dataset. That is, we can't subset using matrix or data.frame notation because we only have one dimension.

renaming dataframe column in a list

I have a list with two dataframes (each with two columns) and I want to rename a specific column in this list.
sample_df1<-data.frame(coltest11=1:6,coltest12=5:10)
sample_df2<-data.frame(coltest21=5:10,coltest22=1:6)
sample_ls<-list("a"=sample_df1, "b"=sample_df2)
colnames(sample_ls[["a"]][2])<-"test"
names(sample_ls[["a"]][2])
but the result is
[1] "coltest12"
I spent more than an hour looking at other topics but can't figure out what I am missing.
Your current problem is that you are accessing the second entry in the list, then taking its names and trying to change it. Instead, if you want to rename the second column in the a data frame, then just access the second entry in names, and rename it:
names(sample_ls$a)[2] <- "test" # the [2] belongs on the outside, not inside
sample_ls$a
coltest11 test
1 1 5
2 2 6
3 3 7
4 4 8
5 5 9
6 6 10
Data:
sample_df1 <- data.frame(coltest11=1:6, coltest12=5:10)
sample_df2 <- data.frame(coltest21=5:10, coltest22=1:6)
sample_ls <- list(a=sample_df1, b=sample_df2)

Replace semicolon-separated values to tab

I am trying to convert the data which I have in txt file:
4.0945725440979;4.07999897003174;4.0686674118042;4.05960083007813;4.05218315124512;...
to a column (table) where the values are separated by tab.
4.0945725440979
4.07999897003174
4.0686674118042...
So far I tried
mydata <- read.table("1.txt", header = FALSE)
separate_data<- strsplit(as.character(mydata), ";")
But it does not work. separate_data in this case consist only of 1 element:
[[1]]
[1] "1"
Based on the OP, it's not directly stated whether the raw data file contains multiple observations of a single variable, or should be broken into n-tuples. Since the OP does state that read.table results in a single row where s/he expects it to contain multiple rows, we can conclude that the correct technique is to use scan(), not read.table().
If the data in the raw data file represents a single variable, then the solution posted in comments by #docendo works without additional effort. Otherwise, additional work is required to tidy the data.
Here is an approach using scan() that reads the file into a vector, and breaks it into observations containing 5 variables.
rawData <- "4.0945725440979;4.07999897003174;4.0686674118042;4.05960083007813;4.05218315124512;4.0945725440979;4.07999897003174;4.0686674118042;4.05960083007813;4.05218315124512"
value <- scan(textConnection(rawData),sep=";")
columns <- 5 # set desired # of columns
observations <- length(aVector) / columns
observation <- unlist(lapply(1:observations,function(x) rep(x,times=columns)))
variable <- rep(1:columns,times=observations)
data.frame(observation,variable,value)
...and the output:
> data.frame(observation,variable,value)
observation variable value
1 1 1 4.094573
2 1 2 4.079999
3 1 3 4.068667
4 1 4 4.059601
5 1 5 4.052183
6 2 1 4.094573
7 2 2 4.079999
8 2 3 4.068667
9 2 4 4.059601
10 2 5 4.052183
>
At this point the data can be converted into a wide format tidy data set with reshape2::dcast().
Note that this solution requires that the number of data values in the raw data file is evenly divisible by the number of variables.

How to split data.frame into smaller data.frames of predetermined number of rows? [duplicate]

This question already has answers here:
The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe
(11 answers)
Closed 7 years ago.
I have the following data frame:
df <- data.frame(a=rep(1:3),b=rep(1:3),c=rep(4:6),d=rep(4:6))
df
a b c d
1 1 1 4 4
2 2 2 5 5
3 3 3 6 6
i would like to have a vector N which determines my window size so for thsi example i will set
N <- 1
I would like to split this dataframe into equal portions of N rows and store the 3 resulting dataframes into a list.
I have the following code:
groupMaker <- function(x, y) 0:(x-1) %/% y
testlist2 <- split(df, groupMaker(nrow(df), N))
The problem is that this code renames my column names by adding an X0. in front
result <- as.data.frame(testlist2[1])
result
X0.a X0.b X0.c X0.d
1 1 1 4 4
>
I would like a code that does the exact same thing but keeps the column names as they are. please keep in mind that my original data has a lot more than 3 rows so i need something that is applicable to a much larger dataframe.
To extract a list element, we can use [[. Also, as each list elements are data.frames, we don't need to explicitly call as.data.frame again.
testlist2[[1]]
We can also use gl to create the grouping variable.
split(df, as.numeric(gl(nrow(df), N, nrow(df))))

How to increment column in R data frame by one?

So I have data like this
Date DJIA Time
1 1/1/96 5117.12 1
2 1/2/96 5177.45 2
3 1/3/96 5194.07 3
4 1/4/96 5173.84 4
5 1/5/96 5181.43 5
6 1/8/96 5197.68 6
I want to decrement the values in the Time column by 1 and remove the first row.
I've achieved both of these steps separately-
data[-1,]
removes the first row, while
data$Time - 1
decrements, but returns me the decremented columns.
How do I make it so that I get something like this
Date DJIA Time
1 1/2/96 5177.45 1
2 1/3/96 5194.07 2
3 1/4/96 5173.84 3
4 1/5/96 5181.43 4
5 1/8/96 5197.68 5
?
I've also tried
data[-1,]$Time - 1
but this again returns me only the time vector decremented by 1, as opposed to changing the entire data frame.
This you got right:
data[-1,]
data$Time - 1
But, as you said, it returns a new data frame; it doesn't change what you already have. So you just need to assign the result back to data:
data <- data[-1,]
data$Time <- data$Time - 1
To better understand, you can do newData <- data[-1,] to create a new data frame without the first row. If you want to transform your original data frame, you need to re-assign it data <- .... Same goes for columns or rows, you need to do data$column <- ....

Resources