Plot a list contains many lists with R - r

I have a list like this, the number above $1081786081 is the user id, I want to plot the day_count according to time.
It's easy to do that if it's a data frame
plot(list4$day_count)
But I don't know how to do it for each list.Should I use lapply?
$`1081786081`
time day_count
1 2016-01-13 2
2 2016-01-20 2
3 2016-02-06 2
4 2016-02-23 2
5 2016-03-14 2
6 2016-03-24 2
7 2016-04-06 2
8 2016-04-11 2
9 2016-05-04 2
10 2016-06-06 2
11 2016-06-26 2
12 2016-07-01 2
$`1087949661`
time day_count
1 2016-01-02 4
2 2016-01-11 2
3 2016-01-20 2
4 2016-01-21 6
5 2016-01-22 2
6 2016-01-27 4
7 2016-01-30 4
8 2016-02-02 2
9 2016-02-05 2

If we need to plot the list of data.frames in a single pdf with separate pages for each plot, after setting the output .pdf, we loop through the 'list4', and plot.
pdf("yourplot.pdf")
invisible(lapply(list4, function(x) with(x, plot(time, day_count))))
dev.off()
We can also create some identifier for each plot by looping through the names of the list elements
pdf("yourplot.pdf")
invisible(lapply(names(list4), function(nm) with(list4[[nm]],
plot(time, day_count, main = paste("plot of", nm)))))
dev.off()
If we need a single plot with lines, we can rbind the list elements and then do the plotting.
library(dplyr)
library(ggplot2)
bind_rows(list4, .id = "grp") %>%
ggplot(., aes(x=time, y = day_count, colour = grp)) +
geom_line() +
geom_point()
data
list4 <- structure(list(`1081786081` = structure(list(time = structure(c(16813,
16820, 16837, 16854, 16874, 16884, 16897, 16902, 16925, 16958,
16978, 16983), class = "Date"), day_count = c(2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)), .Names = c("time", "day_count"
), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9",
"10", "11", "12"), class = "data.frame"), `1087949661` = structure(list(
time = structure(c(16802, 16811, 16820, 16821, 16822, 16827,
16830, 16833, 16836), class = "Date"), day_count = c(4L,
2L, 2L, 6L, 2L, 4L, 4L, 2L, 2L)), .Names = c("time", "day_count"
), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9"),
class = "data.frame")), .Names = c("1081786081",
"1087949661"))

Related

Error in dunntest: Error in if (tmp$Eclass != "factor") { : the condition has length > 1

I get an error when I try to run the Dunntest on my data and I can't figure out what's causing it.
I have 4 groups with ordinal discrete data, the Kruskal-Wallis test suggest a significant difference between groups but I can't run the dunntest afterwards.
Any help is appreciated.
> mast_cells
# A tibble: 20 × 2
group score
<ord> <dbl>
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 2 3
7 2 4
8 2 2
9 2 1
10 2 3
11 3 2
12 3 1
13 3 2
14 3 3
15 3 3
16 4 3
17 4 2
18 4 3
19 4 2
20 4 2
> mast_cells$group <- ordered(mast_cells$group ,
+ levels = c("1", "2", "3", "4"))
> kruskal.test( score ~ group, data = mast_cells)
Kruskal-Wallis rank sum test
data: score by group
Kruskal-Wallis chi-squared = 9.1875, df = 3, p-value = 0.0269
> library(FSA)
> dunnTest(score ~ group,
+ data = mast_cells,
+ method="Benjamini-Yekuteili")
Error in if (tmp$Eclass != "factor") { : the condition has length > 1
>
dunTest function does not accept formula as an argument, you need specify your data vector as the first argument, and factor as the second one. Additionally if you choose Benjamini-Yekuteili adjustement method for multiple comparison, option method = "by" should be specified.
See the code below:
library(FSA)
mast_cells <- structure(
list(group = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L),
levels = c("1", "2", "3", "4"), class = c("ordered", "factor")),
score = c(1L, 1L, 1L, 1L, 1L, 3L, 4L, 2L, 1L, 3L,
2L, 1L, 2L, 3L, 3L, 3L, 2L, 3L, 2L, 2L)),
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
"11", "12", "13", "14", "15", "16", "17", "18", "19", "20"),
class = "data.frame")
dunnTest(mast_cells$score, mast_cells$group, method = "by")
Output:
Dunn (1964) Kruskal-Wallis multiple comparison
p-values adjusted with the Benjamini-Yekuteili method.
Comparison Z P.unadj P.adj
1 1 - 2 -2.6685305 0.007618387 0.11199029
2 1 - 3 -2.1348244 0.032775359 0.16059926
3 2 - 3 0.5337061 0.593544894 1.00000000
4 1 - 4 -2.4999917 0.012419622 0.09128422
5 2 - 4 0.1685388 0.866159449 1.00000000
6 3 - 4 -0.3651673 0.714986507 1.00000000

Select certain columns from A table to JOIN certain columns in B table in R

1.Simply like using Ctrl+X select a columns add Ctrl+V into a certain
columns in another table freely.
2.Select bigger table's columns to join smaller table that full_join and by
function cannot do.(also they have different cols names.)
#A table (bigger table)
Manufactor Models date Serial
1 audi r55 21341 34j
2 bmw e44 13214 F34
3 cadillc fr4c 23124 00deaa
4 benz c45z 21415 3rf
5 lexus l56fs 97014 3r
6 toyota de22 75199 2ghre
#B table (smaller table)
Markers Price Types
1 Asaudi 4011 ar55
2 abmw 2334 ae44
3 acadillc 1445 fsr4c
4 fbenz 1455 cdf45z
5 falexus 5551l5 ff6fs
6 12toyota 51242 de22
Expected picture
#B table
Markers Price Types
1 Asaudi 4011 ar55
2 abmw 2334 ae44
3 acadillc 1445 fsr4c
4 fbenz 1455 cdf45z
5 falexus 5551l5 ff6fs
6 12toyota 51242 de22
7 audi NA r55
8 bmw NA e44
9 cadillc NA fr4c
10 benz NA c45z
11 lexus NA l56fs
12 toyota NA de22
Eliminate unnecessary cols in A table firstly, to fit full_join by =c("x col name"="y col name") limitation is the way but it is inefficient .Are there
more clean and efficient way to do that?
Your illustration suggests that you can achieve the expected result using below code snippet
library(dplyr)
A %>%
select(-date, -Serial) %>%
`colnames<-`(c('Markers','Types')) %>%
bind_rows(B,.)
Output is:
Markers Price Types
1 Asaudi 4011 ar55
2 abmw 2334 ae44
3 acadillc 1445 fsr4c
4 fbenz 1455 cdf45z
5 falexus 5551l5 ff6fs
6 12toyota 51242 de22
7 audi <NA> r55
8 bmw <NA> e44
9 cadillc <NA> fr4c
10 benz <NA> c45z
11 lexus <NA> l56fs
12 toyota <NA> de22
Sample data:
> dput(A)
structure(list(Manufactor = structure(c(1L, 3L, 4L, 2L, 5L, 6L
), .Label = c("audi", "benz", "bmw", "cadillc", "lexus", "toyota"
), class = "factor"), Models = structure(c(6L, 3L, 4L, 1L, 5L,
2L), .Label = c("c45z", "de22", "e44", "fr4c", "l56fs", "r55"
), class = "factor"), date = c(21341L, 13214L, 23124L, 21415L,
97014L, 75199L), Serial = structure(c(3L, 6L, 1L, 5L, 4L, 2L), .Label = c("00deaa",
"2ghre", "34j", "3r", "3rf", "F34"), class = "factor")), .Names = c("Manufactor",
"Models", "date", "Serial"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
> dput(B)
structure(list(Markers = structure(c(4L, 2L, 3L, 6L, 5L, 1L), .Label = c("12toyota",
"abmw", "acadillc", "Asaudi", "falexus", "fbenz"), class = "factor"),
Price = structure(c(4L, 3L, 1L, 2L, 6L, 5L), .Label = c("1445",
"1455", "2334", "4011", "51242", "5551l5"), class = "factor"),
Types = structure(c(2L, 1L, 6L, 3L, 5L, 4L), .Label = c("ae44",
"ar55", "cdf45z", "de22", "ff6fs", "fsr4c"), class = "factor")), .Names = c("Markers",
"Price", "Types"), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6"))

Remove column names in a DataFrame

In sparkR I have a DataFrame data.
When I type head(data) we get this output
C0 C1 C2 C3
1 id user_id foreign_model_id machine_id
2 1 3145 4 12
3 2 4079 1 8
4 3 1174 7 1
5 4 2386 9 9
6 5 5524 1 7
I want to remove C0,C1,C2,C3 because they give me problems later one. For example when I use the filter function:
filter(data,data$machine_id==1)
can't run because of this.
I have read the data like this
data <- read.df(sqlContext, "/home/ole/.../data", "com.databricks.spark.csv")
SparkR made the header into the first row and gave the DataFrame a new header because the default for the header option is "false". Set the header option to header="true" and then you won't have to handle with this problem.
data <- read.df(sqlContext, "/home/ole/.../data", "com.databricks.spark.csv", header="true")
Try
colnames(data) <- unlist(data[1,])
data <- data[-1,]
> data
# id user_id foreign_model_id machine_id
#2 1 3145 4 12
#3 2 4079 1 8
#4 3 1174 7 1
#5 4 2386 9 9
#6 5 5524 1 7
If you wish, you can add rownames(data) <- NULL to correct for the row numbers after the deletion of the first row.
After this manipulation, you can select rows that correspond to certain criteria, like
subset(data, data$machine_id==1)
# id user_id foreign_model_id machine_id
#4 3 1174 7 1
In base R, the function filter() suggested in the OP is part of the stats namespace and is usually reserved for the analysis of time series.
data
data <- structure(list(C0 = structure(c(6L, 1L, 2L, 3L, 4L, 5L),
.Label = c("1", "2", "3", "4", "5", "id"), class = "factor"),
C1 = structure(c(6L, 3L, 4L, 1L, 2L, 5L), .Label = c("1174", "2386",
"3145", "4079", "5524", "user_id"), class = "factor"),
C2 = structure(c(5L, 2L, 1L, 3L, 4L, 1L),
.Label = c("1", "4", "7", "9", "foreign_model_id"), class = "factor"),
C3 = structure(c(6L, 2L, 4L, 1L, 5L, 3L),
.Label = c("1", "12", "7", "8", "9", "machine_id"), class = "factor")),
.Names = c("C0", "C1", "C2", "C3"), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6"))
try this
names <- c()
for (i in seq(along = names(data))) {
names <- c(names, toString(data[1,i]))
}
names(data) <- names
data <- data[-1,]
I simply can't use the answers because in sparkR it can't run: object of type 'S4' is not subsettable. I solved the problem this way, however, I think there is a better way to solve it.
data <- withColumnRenamed(data, "C0","id")
data <- withColumnRenamed(data, "C1","user_id")
data <- withColumnRenamed(data, "C2","foreign_model_id")
data <- withColumnRenamed(data, "C3","machine_id")
And now I can successfully use the filter function as I want to.

Create crosstab from three values

I have a data frame with three variables and I want the first variable to be the row names, the second variable to be the column names, and the third variable to be the values associated with those two parameters, with NA or blank where data may be missing. Is this easy/possible to do in R?
example input
structure(list(
Player = c("1","1","2","2","3","3","4","4","5","5","6"),
Type = structure(c(2L, 1L, 2L, 1L, 2L, 1L,2L, 1L, 2L, 1L, 1L),
.Label = c("Long", "Short"), class = "factor"),
Yards = c("23","41","50","29","11","41","48","12","35","27","25")),
.Names = c("Player", "Type", "Yards"),
row.names = c(NA, 11L),
class = "data.frame")
Using the sample data you gave:
df <- structure(list(Player = c("1", "1", "2", "2", "3", "3", "4", "4", "5",
"5", "6"), Type = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L),
.Label = c("Long", "Short"), class = "factor"),
Yards = c("23", "41", "50", "29", "11", "41", "48", "12", "35", "27", "25")),
.Names = c("Player", "Type", "Yards"), row.names = c(NA, 11L),
class = "data.frame")
Player Type Yards
1 1 Short 23
2 1 Long 41
3 2 Short 50
4 2 Long 29
5 3 Short 11
6 3 Long 41
7 4 Short 48
8 4 Long 12
9 5 Short 35
10 5 Long 27
11 6 Long 25
dcast will be able to tabulate the two variables.
library(reshape2)
df.cast <- dcast(df, Player~Type, value.var="Yards")
The Player column will be a column, so you need to do a bit extra to make it the row names of the data.frame
rownames(df.cast) <- df.cast$Player
df.cast$Player <- NULL
Long Short
1 41 23
2 29 50
3 41 11
4 12 48
5 27 35
6 25 <NA>

merging two files into a new file

I have 2 files with say 3 columns and a few rows.
1 2 10
2 3 20
3 4 30
4 5 40
5 1 50
6 1 60
and
1 8 10
2 3 100
3 4 45
4 5 78
5 2 99
6 80 60
Now i want to create a third file having all the values of first two files and also if first and second column of both the files are same then in third file the values corresponding to them should like say,value in third column of first file must be in third column of newly created file and value in third column of second file must be in fourth column of newly created file.
According to above example answer should be
1 2 10 0
2 3 20 100
3 4 30 45
4 5 40 78
1 8 10 0
5 1 50 0
6 1 60 0
5 2 99 0
6 80 60 0
res <- merge(dat1,dat2, by=c("V1", "V2"),all=TRUE)
indx <- is.na(res[,3])
res[indx,3] <- res[indx,4]
res[indx,4] <- NA
res[is.na(res)] <- 0
# V1 V2 V3.x V3.y
#1 1 2 10 0
#2 1 8 10 0
#3 2 3 20 100
#4 3 4 30 45
#5 4 5 40 78
#6 5 1 50 0
#7 5 2 99 0
#8 6 1 60 0
#9 6 80 60 0
data
dat1 <- structure(list(V1 = structure(1:6, .Label = c("1", "2", "3",
"4", "5", "6"), class = "factor"), V2 = structure(c(2L, 3L, 4L,
5L, 1L, 1L), .Label = c("1", "2", "3", "4", "5"), class = "factor"),
V3 = structure(1:6, .Label = c("10", "20", "30", "40", "50",
"60"), class = "factor")), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA,
-6L))
dat2 <- structure(list(V1 = structure(1:6, .Label = c("1", "2", "3",
"4", "5", "6"), class = "factor"), V2 = structure(c(5L, 2L, 3L,
4L, 1L, 6L), .Label = c("2", "3", "4", "5", "8", "80"), class = "factor"),
V3 = structure(c(1L, 2L, 3L, 5L, 6L, 4L), .Label = c("10",
"100", "45", "60", "78", "99"), class = "factor")), .Names = c("V1",
"V2", "V3"), class = "data.frame", row.names = c(NA, -6L))
Convert the data columns to numeric class before you try the above code
dat1[] <- lapply(dat1, function(x) as.numeric(as.character(x)))
dat2[] <- lapply(dat2, function(x) as.numeric(as.character(x)))
It would be easier if you post an example with dput(). I would check if ?merge helps or rbind.fill (package plyr).
Hope this helps
Hermann

Resources