Multiple panel plot non-NA error - r

I've got a data frame that looks like this.
a[,2:25]
UT1 UT2 UT3 UT4 UT5 UT6 UT7 UT8 UT9 UT10 UT11 UT12 TR1 TR2 TR3 TR4
3094 9 0 1 37 6 2 8 1 1 6 3 1 3 0 0 1
3095 4 0 0 10 17 6 7 1 5 3 1 12 2 0 0 1
3096 18 0 0 4 6 15 14 0 7 9 3 8 5 2 1 2
3097 11 0 0 7 5 15 10 2 4 7 16 17 7 3 0 0
3098 18 0 11 2 5 11 7 3 2 1 1 0 3 3 1 1
3099 25 0 6 11 17 3 10 1 1 3 9 2 2 1 1 2
3100 1 0 1 27 12 28 27 0 2 11 6 0 1 7 4 6
3101 0 0 1 40 0 17 13 1 0 3 3 0 1 3 3 1
3102 2 0 0 30 1 9 2 1 1 5 0 0 1 3 3 0
3103 3 0 0 11 4 7 5 2 4 0 1 0 5 4 0 0
3104 5 0 0 3 1 10 4 2 3 0 3 0 7 2 1 0
TR5 TR6 TR7 TR8 TR9 TR10 TR11 TR12
3094 1 0 15 3 0 0 42 1
3095 1 0 4 29 0 0 42 0
3096 0 0 3 22 0 0 3 0
3097 1 0 4 14 0 0 2 0
3098 0 0 1 10 0 0 1 0
3099 0 0 4 41 1 0 3 0
3100 0 0 10 21 0 0 17 0
3101 0 0 2 1 1 0 13 3
3102 0 0 2 4 0 0 10 3
3103 1 0 3 4 0 0 12 1
3104 0 0 1 2 0 0 8 0
The first column of my data it's time so I separated it using
tiempo<-a$Tiempo
tiempo
[1] 618.6 618.8 619.0 619.2 619.4 619.6 619.8 620.0 620.2 620.4 620.6
In order to plot each column as a fucntion of time and do lm I used reshape package and lattice. I'm not sure that's the best option but almost gets me what I want.
The code looks like this:
m<-melt(a[,2:25])
f<-m$variable
xyplot(m$value~tiempo | f, panel=function(x,y,...){
panel.xyplot(x,y,...)
panel.lmline(x,y, col=2, lty=2)
})
And the output is this graph
I don't get why it gives this error, I expect them to be non-NA, I don't understand why it is a problem. In fact, the first panel worked just fine.
When I change the panel.lmline(...) part this happens:
xyplot(m$value~tiempo | f, panel=function(x,y,...){
panel.xyplot(tiempo,m$value,...)
panel.lmline(tiempo,m$value, col=2, lty=2)
})
I get this lenght error but I think it's because each panel is using all datapoints from m when it should be using only 11.
The lm regression function I use is separated from the plotting and this doesn't mess with my statistical analysis but I'm trying to put everything together and won't be able to do it if I can't plot the data. I want visual information about the regression in order to be able to remove outliers if the Rsquared is too low or maybe not even consider that observation.
I hope I've made myself clear.
Thank you very much
Edited with suggestions

You got most of the code right.
It would be better to use the time (tiempo) variable as an id variable in your melt call
This will ensure the lengths of the data match up.
library(reshape2) #This is faster version of reshape
df.m <- melt(df.matias, id.var="Tiempo") #I stored your data in df.matias
Now we can use the melted data to make your plot
library(lattice)
xyplot(value ~ Tiempo | variable, data = df.m,
panel = function(x,y,...) {
panel.xyplot(x,y,...)
panel.lmline(x,y, col = 2, lty =2)
})

Related

R inspect() function, from tm package, only returns 10 outputs when using dictionary terms

I have 70 PDFs of scientific papers that I'm trying to narrow down by looking for specific terms within them, using the dictionary function of inspect(), which is part of the tm package. My PDFs are stored in a VCorpus object. Here's an example of what my code looks like using the crude dataset and common terms that would show up in (probably) every example paper in crude:
library(tm)
output.matrix <- inspect(DocumentTermMatrix(crude,
list(dictionary = c("i","and",
"all","of",
"the","if",
"i'm","looking",
"for","but","because","has",
"it","was"))))
output <- data.frame(output.matrix)
This search only ever returns 10 papers into output.matrix. The outcome given is:
Docs all and because but for has i i'm the was
144 0 9 0 5 5 2 0 0 17 1
236 0 7 4 2 4 5 0 0 15 7
237 1 11 1 3 3 2 0 0 30 2
246 0 9 0 0 6 1 0 0 18 2
248 1 6 1 1 2 0 0 0 27 4
273 0 5 2 2 4 1 0 0 21 1
368 0 1 0 1 0 0 0 0 11 2
489 0 5 0 0 4 0 0 0 8 0
502 0 6 0 1 5 0 0 0 13 0
704 0 5 1 0 3 2 0 0 21 0
For my actual dataset of 70 papers, I know there should be greater than 10 because as I add more PDFs to my VCorpus, which I know contain at least one of my search terms, I still only get 10 in the output. I want to adjust the outcome to be a list, like the one shown, that gives every paper from the VCorpus that contains a term, not just what I assume is the first 10.
Using R version 4.0.2, macOS High Sierra 10.13.6
You are misinterpreting what inspect does. For a document term matrix it show the first 10 rows and columns. inspect should only be used to check your corpus or document term matrix if it looks as you expect. Never for transforming data to a data.frame. If you want the data of the document term matrix in a data.frame, the following piece of code does this, using your example code and removing all the rows and columns that don't have a value for any of the documents or terms.
# do not use inspect as this will give a wrong result!
output.matrix <- DocumentTermMatrix(crude,
list(dictionary = c("i","and",
"all","of",
"the","if",
"i'm","looking",
"for","but","because","has",
"it","was")))
# remove rows and columns that are 0 staying inside a sparse matrix for speed
out <- output.matrix[slam::row_sums(output.matrix) > 0,
slam::col_sums(output.matrix) > 0]
# transform to data.frame
out_df <- data.frame(docs = row.names(out), as.matrix(out), row.names = NULL)
out_df
docs all and because but for. has the was
1 127 0 1 0 0 2 0 5 1
2 144 0 9 0 5 5 2 17 1
3 191 0 0 0 0 2 0 4 0
4 194 1 1 0 0 2 0 4 1
5 211 0 2 0 0 2 0 8 0
6 236 0 7 4 2 4 5 15 7
7 237 1 11 1 3 3 2 30 2
8 242 0 3 0 1 1 1 6 1
9 246 0 9 0 0 6 1 18 2
10 248 1 6 1 1 2 0 27 4
11 273 0 5 2 2 4 1 21 1
12 349 0 2 0 0 0 0 5 0
13 352 0 3 0 0 0 0 7 1
14 353 0 1 0 0 2 1 4 3
15 368 0 1 0 1 0 0 11 2
16 489 0 5 0 0 4 0 8 0
17 502 0 6 0 1 5 0 13 0
18 543 0 0 0 0 3 0 5 1
19 704 0 5 1 0 3 2 21 0
20 708 0 0 0 0 0 0 0 1

data cleaning for plotting data frames

I am currently working with survey data in R studio. I originally had two csv files but I merged them into one. Both CSV files contained sample IDs. The first file also contains bivariate info, while the second contains rating as a continuous variable.
Here is a sample of the data
ID O1 O2 O3 O4 O5 O6 O7 O8 S1 S2 S3 S4 S5 S6 S7 S8
22 0 1 0 1 0 1 0 1 4 6 2 6 4 3 6 2
23 0 1 0 0 1 1 0 1 5 6 10 4 5 7 7 6
24 0 1 1 0 1 0 0 1 7 4 7 8 7 6 3 9
25 0 0 1 1 0 0 1 1 3 5 5 7 4 6.9 6 5
26 0 1 0 0 1 1 0 1 2 2.5 7 5 4 5 4 3
27 0 1 1 1 0 1 0 0 6 3 4 6 5 6 5 6
28 0 1 1 1 0 0 0 1 7 4 2 8 2 1 4 5
29 0 0 1 0 1 1 1 0 2 5 1 2 4 3 2 2
30 0 1 0 1 1 1 0 0 8 2 6 7 1 7 5 4
31 0 0 0 1 0 1 1 1 7 4 3 2 4 5 7 2
32 0 0 1 0 0 1 1 1 4 7 5 3 1 6 2 3
33 0 1 1 0 1 1 0 0 7 4 5 8 8 5 6 7
For example the 0 in O1 corresponds to the 4 in S1.
I want to make a loop that will sum all of the values corresponding to variable 0 and 1.
if value in O1 is 0, add value in S1 to "sum of 0"
if value in O1 is 1, add value in S1 to "sum of 1"
repeat for all columns to get a total value for 0 and 1.
Any strategies or tips would be helpful going forward!

formatting table/matrix in R

I am trying to use a package where the table they've used is in a certain format, I am very new to R and don't know how to get my data in this same format to be able to use the package.
Their table looks like this:
Recipient
Actor 1 10 11 12 2 3 4 5 6 7 8 9
1 0 0 0 1 3 1 1 2 3 0 2 6
10 1 0 0 1 0 0 0 0 0 0 0 0
11 13 5 0 5 3 8 0 1 3 2 2 9
12 0 0 2 0 1 1 1 3 1 1 3 0
2 0 0 2 0 0 1 0 0 0 2 2 1
3 9 9 0 5 16 0 2 8 21 45 13 6
4 21 28 64 22 40 79 0 16 53 76 43 38
5 2 0 0 0 0 0 1 0 3 0 0 1
6 11 22 4 21 13 9 2 3 0 4 39 8
7 5 32 11 9 16 1 0 4 33 0 17 22
8 4 0 2 0 1 11 0 0 0 1 0 1
9 0 0 3 1 0 0 1 0 0 0 0 0
Where mine at the moment is:
X0 X1 X2 X3 X4 X5
0 0 2 3 3 0 0
1 1 0 4 2 0 0
2 0 0 0 0 0 0
3 0 2 2 0 1 0
4 0 0 3 2 0 2
5 0 0 3 3 1 0
I would like to add the recipient and actor to mine, as well as change to row and column names to 1, ..., 6.
Also my data is listed under Data in my Workspace and it says:
'num' [1:6,1:6] 0 1 ...
Whereas the example data in the workspace is shown in Values as:
'table' num [1:12,1:12] 0 1 13 ...
Please let me know if you have suggestion to get my data in the same type and style as theirs, all help is greatly appreciated!
OK, so you have a matrix like so:
m <- matrix(c(1:9), 3)
rownames(m) <- 0:2
colnames(m) <- paste0("X", 0:2)
# X0 X1 X2
#0 1 4 7
#1 2 5 8
#2 3 6 9
First you need to remove the Xs and turn it into a table:
colnames(m) <- sub("X", "", colnames(m))
m <- as.table(m)
# 0 1 2
#0 1 4 7
#1 2 5 8
#2 3 6 9
Then you can set the dimension names:
names(dimnames(m)) <- c("Actor", "Recipient")
# Recipient
#Actor 0 1 2
# 0 1 4 7
# 1 2 5 8
# 2 3 6 9
However, usually you would create the contingency table from raw data using the table function, which would automatically return a table object. So, maybe you should fix the step creating your matrix?

How to count number of particular values

My data looks like this:
ID CO MV
1 0 1
1 5 0
1 0 1
1 9 0
1 8 0
1 0 1
2 69 0
2 0 1
2 8 0
2 0 1
2 78 0
2 53 0
2 0 1
2 3 0
3 54 0
3 0 1
3 8 0
3 90 0
3 0 1
3 56 0
4 0 1
4 56 0
4 0 1
4 45 0
4 0 1
4 34 0
4 31 0
4 0 1
4 45 0
5 0 1
5 0 1
5 67 0
I want it to look like this:
ID CO MV CONUM
1 0 1 3
1 5 0 3
1 0 1 3
1 9 0 3
1 8 0 3
1 0 1 3
2 69 0 5
2 0 1 5
2 8 0 5
2 0 1 5
2 78 0 5
2 53 0 5
2 0 1 5
2 3 0 5
3 54 0 4
3 0 1 4
3 8 0 4
3 90 0 4
3 0 1 4
3 56 0 4
4 0 1 5
4 56 0 5
4 0 1 5
4 45 0 5
4 0 1 5
4 34 0 5
4 31 0 5
4 0 1 5
4 45 0 5
5 0 1 1
5 0 1 1
5 67 0 1
I want to create a column CONUM which is the total number of values other than zero in the CO column for each value in the ID column. So for example the CO column for ID 1 has 3 values other than zero, therefore the corresponding values in CONUM column is 3. The MV column is 0 if CO column has a value and 1 if CO column is 0. So another way to accomplish creating the CONUM column would be to count the number of zeros per ID . It would be great if you could help me with the r code to accomplish this. Thanks.
Here is an option with data.table
library(data.table)
setDT(df)[,CONUM:=sum(CO!=0) ,ID][]
You can use ave in base R:
dat <- transform(dat, CONUM = ave(as.logical(CO), ID, FUN = sum))
and an option with dplyr
# install.packages("dplyr")
library(dplyr)
dat <- dat %>%
group_by(ID) %>%
mutate(CONUM = sum(CO != 0))

cumulative counter in dataframe R

I have a dataframe with many rows, but the structure looks like this:
year factor
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 1
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 1
18 0
19 0
20 0
I would need to add a counter as a third column. It should count the cumulative cells that contains zero until it set again to zero once the value 1 is encountered. The result should look like this:
year factor count
1 0 0
2 0 1
3 0 2
4 0 3
5 0 4
6 0 5
7 0 6
8 0 7
9 1 0
10 0 1
11 0 2
12 0 3
13 0 4
14 0 5
15 0 6
16 0 7
17 1 0
18 0 1
19 0 2
20 0 3
I would be glad to do it in a quick way, avoiding loops, since I have to do the operations for hundreds of files.
You can copy my dataframe, pasting the dataframe in "..." here:
dt <- read.table( text="...", , header = TRUE )
Perhaps a solution like this with ave would work for you:
A <- cumsum(dt$factor)
ave(A, A, FUN = seq_along) - 1
# [1] 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3
Original answer:
(Missed that the first value was supposed to be "0". Oops.)
x <- rle(dt$factor == 1)
y <- sequence(x$lengths)
y[dt$factor == 1] <- 0
y
# [1] 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 0 1 2 3

Resources