How to read one specific column from table() in R - r

From a 100000+ rows table I generated this small table with table() in R:
> TableName <- table(ProductID = test$ProductID,format(test$Dates, "%y%m%d"))
> TableName
ProductID 161024 161025 161026 161027 161028 161029 161030
1 1 2 4 1 2 3 5
2 4 4 7 3 8 1 8
3 1 1 1 0 0 0 0
6 1 1 1 0 0 0 0
8 3 9 8 6 1 7 3
In the normal time, I can read one specific column with TableName$ColumnName but it doesn't work with the table generated from table() unless I write this table to a .csv file.
Is there any way that I can read one specific column without write the table to a .csv file and read the same .csv file back to R?

For matrix, table, the $ will not work, so, we need to use [
TableName[, '161024']

Related

Generate three level dependency in case a verb is attached with non verb in dependency parsing

I am using dependency parsing for a use case in R with the corenlp package. However, I need to tweak the dataframe for a specific use case.
I need a dataframe where I have three columns. I have used the below code to reach till the dependency tree.
devtools::install_github("statsmaths/coreNLP")
coreNLP::downloadCoreNLP()
initCoreNLP()
inp_cl = "generate odd numbers from column one and print."
output = annotateString(inp_cl)
dc = getDependency(output)
sentence governor dependent type governorIdx dependentIdx govIndex depIndex
1 1 ROOT generate root 0 1 NA 1
2 1 numbers odd amod 3 2 3 2
3 1 generate numbers dobj 1 3 1 3
4 1 column from case 5 4 5 4
5 1 generate column nmod:from 1 5 1 5
6 1 column one nummod 5 6 5 6
7 1 column and cc 5 7 5 7
8 1 generate print nmod:from 1 8 1 8
9 1 column print conj:and 5 8 5 8
10 1 generate . punct 1 7 1 10
Using POS tagging with the following code, I ended up with the following data frame.
ps = getToken(output)
ps = ps[,c(1,2,7,3)]
colnames(dc)[8] = "id"
dp = merge(dc, ps[,c("sentence","id","POS")],
by.x=c("sentence","governorIdx"),by.y = c("sentence","id"),all.x = T)
dp = merge(dp, ps[,c("sentence","id","POS")],
by.x=c("sentence","dependentIdx"),by.y = c("sentence","id"),all.x = T)
colnames(dp)[9:10] = c("POS_gov","POS_dep")
sentence dependentIdx governorIdx governor dependent type govIndex id POS_gov POS_dep
1 1 1 0 ROOT generate root NA 1 <NA> VB
2 1 2 3 numbers odd amod 3 2 NNS JJ
3 1 3 1 generate numbers dobj 1 3 VB NNS
4 1 4 5 column from case 5 4 NN IN
5 1 5 1 generate column nmod:from 1 5 VB NN
6 1 6 5 column one nummod 5 6 NN CD
7 1 7 5 column and cc 5 7 NN CC
8 1 8 1 generate print nmod:from 1 8 VB NN
9 1 8 5 column print conj:and 5 8 NN NN
10 1 9 1 generate . punct 1 9 VB .
In case a verb(action word) is attached to a non-verb(non action word), but the non-verb(non-action word) is connected to other non-verb(non-action words) then one row should indicate the entire connection. Eg: generate is a verb connected to numbers and numbers is a non verb connected to odd.
So the intended data frame needs to be
Topic1 Topic2 Action
numbers odd generate
column from generate
column one generate
column and generate
column from print
column one print
column and print
. generate
First you'll need to have your dependency tree tag print as a verb, rather than a noun.
Try using a sentence with two independent clauses, and see if the root of the second independent clause is tagged as such.
If so, it's a simple walk through the governoridx column. If not, you'll need to address the mechanics of your dependency tree generator.

Split dataframe to multiple small dataframes in R [duplicate]

This question already has answers here:
Split dataframe into multiple output files
(2 answers)
Closed 5 years ago.
I have a large dataframe I would like to split into multiple small data frames, based on the value in the Name column.
head(DATAFILE)
# Age Site Name 1 2 3 4 5
# 10 1 Orange 0 2 1 0 1
# 10 1 Apple 2 5 4 0 2
# 10 1 Banana 0 0 0 0 2
# 20 2 Orange 0 2 1 0 0
# 20 2 Apple 0 2 0 7 1
# 20 2 Banana 0 4 1 3 6
And an example file of the desired output;
head(Orange)
# Age Site Name 1 2 3 4 5
# 10 1 Orange 0 2 1 0 1
# 20 2 Orange 0 2 1 0 0
I have tried
SPLIT.DATA <- split(DATAFILE, DATAFILE$Name, drop = FALSE)
But this returns a large list, and I would like individual files so that I can save them as .csv files. So I would like either a better way of dividing the original file, or a way to further divide the SPLIT.DATA file.
It is better to save the datasets directly from the list output of split itself instead of creating individual objects in the global environment. We loop by the names of the 'SPLIT.DATA', and write the list elements to individual csv files with the same name as the names of the list elements by pasteing the names to .csv in the write.csv call.
lapply(names(SPLIT.DATA), function(nm)
write.csv(SPLIT.DATA[[nm]], paste0(nm, ".csv"), row.names = FALSE, quote = FALSE))

R table function

If I have a vector numbers <- c(1,1,2,4,2,2,2,2,5,4,4,4), and I use 'table(numbers)', I get
names 1 2 4 5
counts 2 5 4 1
What if I want it to include 3 also or generally, all numbers from 1:max(numbers) even if they are not represented in numbers. Thus, how would I generate an output as such:
names 1 2 3 4 5
counts 2 5 0 4 1
If you want R to add up numbers that aren't there, you should create a factor and explicitly set the levels. table will return a count for each level.
table(factor(numbers, levels=1:max(numbers)))
# 1 2 3 4 5
# 2 5 0 4 1
For this particular example (positive integers), tabulate would also work:
numbers <- c(1,1,2,4,2,2,2,2,5,4,4,4)
tabulate(numbers)
# [1] 2 5 0 4 1

R saving the output of table() into a data frame

I have the following data frame:
id<-c(1,2,3,4,1,1,2,3,4,4,2,2)
period<-c("first","calib","valid","valid","calib","first","valid","valid","calib","first","calib","valid")
df<-data.frame(id,period)
typing
table(df)
results in
period
id calib first valid
1 1 2 0
2 2 0 2
3 0 0 2
4 1 1 1
however if I save it as a data frame 'df'
df<-data.frame(table(df))
the format of 'df' would be like
id period Freq
1 1 calib 2
2 2 calib 1
3 3 calib 1
4 4 calib 0
5 1 first 1
6 2 first 2
7 3 first 0
8 4 first 0
9 1 valid 0
10 2 valid 0
11 3 valid 2
12 4 valid 3
how can I avoid this and how can I save the first output as it is into a data frame?
more importantly is there any way to get the same result using 'dcast'?
Would this help?
> data.frame(unclass(table(df)))
calib first valid
1 1 2 0
2 2 0 2
3 0 0 2
4 1 1 1
To elaborate just a little bit. I've changed the ids in the example data.frame such that your ids are not 1:4, in order to prove that the ids are carried along into the table and are not a sequence of row counts.
id <- c(10,20,30,40,10,10,20,30,40,40,20,20)
period <- c("first","calib","valid","valid","calib","first","valid","valid","calib","first","calib","valid")
df <- data.frame(id,period)
Create the new data.frame one of two ways. rengis answer is fine for 2-column data frames that have the id column first. It won't work so well if your data frame has more than 2 columns, or if the columns are in a different order.
Alternative would be to specify the columns and column order for your table:
df3 <- data.frame(unclass(table(df$id, df$period)))
the id column is included in the new data.frame as row.names(df3). To add it as a new column:
df3$id <- row.names(df3)
df3
calib first valid id
10 1 2 0 10
20 2 0 2 20
30 0 0 2 30
40 1 1 1 40

How to convert row names of table into a vector

I have returned stats on my data using the table command as such:
subject<-c(4,4,2,2,3,3)
correct<-c(0,1,1,1,0,0)
test<-data.frame(subject,correct)
freq_test<-head(table(test$subject,test$correct))
This returns a table which looks like this
0 1
2 0 2
3 2 0
4 1 1
That's great, but the problem is that I would like, the first column to be a vector rather than row.names (so that I can code it properly as "subject").
Is there a way to get this column to act in this way?
Just make a new data frame with the row names of freq_test as the first column:
> df<-data.frame(as.numeric(rownames(freq_test)),freq_test)
> colnames(df)[1]="subject"
> df
subject X0 X1
2 2 0 2
3 3 2 0
4 4 1 1
>
Of course, you can rename X0 and X1 to whatever you want by editing colnames(df) as above.
If you want the data in "long" format (useful for some models and plotting, and especially when your tables are more complicated), the table method for the generic function as.data.frame will take care of this for you:
> as.data.frame(table(test))
subject correct Freq
1 2 0 0
2 3 0 2
3 4 0 1
4 2 1 2
5 3 1 0
6 4 1 1
I think you should have used the standard method of construction of a data.frame, which is with name=values pairs:
test <- data.frame( subject=subject, correct=correct)
The first subject will be interpreted as a name to be quoted and the second subject will be interpreted .... i.e, the enclosing environments will be searched for an object named subject and its value will be assigned to the "subject" column of "test".

Resources