I have a table produced by calling table(...) on a column of data, and I get a table that looks like:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
346 351 341 333 345 415 421 425 429 437 436 469 379 424 387 419 392 396 381 421
I'd like to draw a boxplot of these frequencies, but calling boxplot on the table results in an error:
Error in Axis.table(x = c(333, 368.5, 409.5, 427, 469), side = 2) :
only for 1-D table
I've tried coercing the table to an array with as.array but it seems to make no difference. What am I doing wrong?
If I understand you correctly, boxplot(c(tab)) or boxplot(as.vector(tab)) should work (credit to #joran as well).
Related
I would like to recode a numerical variable based on a cut score criterion. If the cut scores are not available in the variable, I would like to recode the closest smaller value as a cut score. Here is a snapshot of dataset:
ids <- c(1,2,3,4,5,6,7,8,9,10)
scores <- c(512,531,541,555,562,565,570,572,573,588)
data <- data.frame(ids, scores)
> data
ids scores
1 1 512
2 2 531
3 3 541
4 4 555
5 5 562
6 6 565
7 7 570
8 8 572
9 9 573
10 10 588
cuts <- c(531, 560, 575)
The first cut score (531) is in the dataset. So it will stay the same as 531. However, 560 and 575 were not available. I would like to recode the closest smaller value (555) to the second cut score as 560 in the new column, and for the third cut score, I'd like to recode 573 as 575.
Here is what I would like to get.
ids scores rescored
1 1 512 512
2 2 531 531
3 3 541 541
4 4 555 560
5 5 562 562
6 6 565 565
7 7 570 570
8 8 572 572
9 9 573 575
10 10 588 588
Any thoughts?
Thanks
One option would be to find the index with findInterval and then get the pmax of the 'scores' corresponding to that index with the 'cuts' and updated the 'rescored' column elements on that index
i1 <- with(data, findInterval(cuts, scores))
data$rescored <- data$scores
data$rescored[i1] <- with(data, pmax(scores[i1], cuts))
data
# ids scores rescored
#1 1 512 512
#2 2 531 531
#3 3 541 541
#4 4 555 560
#5 5 562 562
#6 6 565 565
#7 7 570 570
#8 8 572 572
#9 9 573 575
#10 10 588 588
I have a dataframe of points with x,y positions (in pixels) and would like to filter out all the points +/- 5 pixels. Is there a function similar to dplyr::distinct() but with a cutoff.
Example dataset:
X.1 X Y
1 637 614
2 559 503
3 601 459
4 601 459
5 603 462
6 604 460
I am expecting an output of :
X.1 X Y
1 637 614
2 559 503
3 601 459 <- the first element is preserved.
Thanks
A simple solution is to round your data to the nearest multiple of 5 and then use a regular distinct function:
X.1$x <- round(X.1$x/5)*5
X.1$y <- round(X.1$y/5)*5
distinct(X.1,.keep_all = TRUE)
#Output:
X.1 X Y
1 635 615
2 560 505
3 600 560
Your problem may require a higher level of accuracy however.
This question already has answers here:
Plotting two variables as lines using ggplot2 on the same graph
(5 answers)
Closed 4 years ago.
I'm trying to create a line plot in R, showing lines for different places over time.
My data is in a table with Year in the first column, the places England, Scotland, Wales, NI as separate columns:
Year England Scotland Wales NI
1 2006/07 NA 411 188 111
2 2007/08 NA 415 193 112
3 2008/09 NA 424 194 114
4 2009/10 NA 429 194 115
5 2010/11 NA 428 199 116
6 2011/12 NA 428 200 116
7 2012/13 NA 425 199 117
8 2013/14 NA 427 202 117
9 2014/15 NA 431 200 121
10 2015/16 3556 432 199 126
11 2016/17 3436 431 200 129
12 2017/18 3467 NA NA NA
I'm using ggplot, and can get a lineplot for any of the places, but I'm having difficulty getting lines for all the places on the same plot.
It seems like this might work if I had the places in a column as well (instead of across the top), as then I could set y in the code below to be that column, as opposed to the column that is a specific place. But that seems a bit convoluted and as I have lots of data in the existing format, I'm hoping there's either a way to do this with the format I have or a quick way of transforming it.
ggplot(data=mysheets$sheet1, aes(x=Year, y=England, group=1)) +
geom_line()+
geom_point()
From what I can tell, I'll need to reshape my data (into long form?) but I haven't found a way to do that where I don't have a column for places (i.e., I have a column for each place but the table doesn't have a way of saying these are all places and the same kind of thing).
I've also tried transposing my data, so the places are down the side and the years are along the top, but R still has its own headers for the columns - I guess another option might be if it was possible to have the years as headers and have that recognised by R?
As you said, you have to convert to long format to make the most out of ggplot2.
library(ggplot2)
library(dplyr)
mydata_raw <- read.table(
text = "
Year England Scotland Wales NI
1 2006/07 NA 411 188 111
2 2007/08 NA 415 193 112
3 2008/09 NA 424 194 114
4 2009/10 NA 429 194 115
5 2010/11 NA 428 199 116
6 2011/12 NA 428 200 116
7 2012/13 NA 425 199 117
8 2013/14 NA 427 202 117
9 2014/15 NA 431 200 121
10 2015/16 3556 432 199 126
11 2016/17 3436 431 200 129
12 2017/18 3467 NA NA NA"
)
# long format
mydata <- mydata_raw %>%
tidyr::gather(country, value, England:NI) %>%
dplyr::mutate(Year = as.numeric(substring(Year, 1, 4))) # convert to numeric date
ggplot(mydata, aes(x = Year, y = value, color = country)) +
geom_line() +
geom_point()
I have the following data set
df <- data.frame(student=c(1,2,3,4,5,6,7,8,9), sat=c(365,0,545,630,385,410,0,655,0), act=c(28,20,0,0,16,17,35,29,21))
student sat act
1 365 28
2 0 20
3 545 0
4 630 0
5 385 16
6 410 17
7 0 35
8 655 29
9 0 21
and I'd like to create a new field with the following conditions
If there is an SAT score > 0 use SAT score
If SAT=0, then convert the ACT to an SAT score using the rubric here. (When there was a range in the SAT score, I just used the median.
ACT SAT
8 200
9 210
10 220
11 225
12 250
13 285
14 325
15 360
16 385
17 410
18 440
19 465
20 485
21 505
22 525
23 545
24 560
25 575
26 595
27 615
28 635
29 655
30 675
31 700
32 725
33 750
34 775
35 790
36 800
This is one heck of an ifelse statement. I've tried this:
df$newgrade=-ifelse(ACT=8,200, ifelse (ACT=9,210, ifelse(ACT=10,220, ifelse (ACT=11,225, ACT=12,250, ifelse(ACT=13,285, ifelse (ACT=14,325, ACT=15,D, ifelse(ACT=16,C, ifelse (ACT=17,B, ACT=18,D, ifelse(ACT=19,C, ifelse (ACT=20,B, ACT=21,D, ifelse(ACT=22,C, ifelse (ACT=23,B, ACT=24,D, ifelse(ACT=25,C, ifelse (ACT=26,B, ACT=27,D, ifelse(ACT=28,C, ifelse (ACT=29,B, ACT=30,D, ifelse(ACT=31,C, ifelse (ACT=32,B, ACT=33,D, ifelse(ACT=34,C, ifelse (ACT=35,B, ACT=36,D))))))))))))))))))))
I tried to follow the example at the bottom of this page but it didn't work.
Does anyone have any ideas on how best to achieve this new field?
Thank you for any assistance you may bring.
Let's call conversion to the table you want to use to convert values when df$sat==0. Yo can do something like this:
df$newgrade<-ifelse(df$sat == 0, conversion$SAT[match(df$act, conversion$ACT)], df$sat)
EDIT: If you want to include another condition df$sat ==0 and df$act==0, then df$new grade==0, you can include another ifelse:
df$newgrade<-ifelse(df$sat == 0 & df$act == 0, 0, ifelse(df$sat == 0, conversion$SAT[match(df$act, conversion$ACT)], df$sat))
or use df[is.na(df)]<-0 after create the column df$newgrade, because in those cases ( df$sat ==0 and df$act==0 ) you'll have NAs
I wanted to ask for help, because I am having difficulties ordering my table, because the column for the table to be ordered has duplicates (coltoorder). This is a tiny part of my table. The desired order is custom, roughly speaking, it is based on the order of the first column, except for the first value (887).
text<-"col1 col2 col3 coltoorder
895 2 1374 887
888 2 14 887
1018 3 1065 895
896 2 307 895
889 2 4 888
891 2 8 888
1055 2 971 1018
926 3 241 896
1021 2 87 1018
897 2 64 896"
mytable<-read.table(text=text, header = T)
mytable
desired order
myindex<-c(887,895,888,1018,896) # equivalent to
myindex2<-c(887,887,895,895,888,888,1018,1018,896,896)
some failed attemps
try1<-mytable[match(myindex, mytable$coltoorder),]
try2<-mytable[match(myindex2, mytable$coltoorder),]
try3<-mytable[mytable$coltoorder %in% myindex,]
try3<-mytable[myindex %in% mytable$coltoorder,]
try4<-mytable[myindex2 %in% mytable$coltoorder,]
rownames(mytable) <- mytable$coltoorder # error
It seems like coltoorder should be treated categorically, not numerically. All factors have an order of their levels, so we'll convert to a factor where the levels are ordered according to myindex. Then this ordering is "baked in" to the column and we can use order normally on it.
mytable$coltoorder = factor(mytable$coltoorder, levels = myindex)
mytable[order(mytable$coltoorder), ]
# col1 col2 col3 coltoorder
# 8 895 2 1374 887
# 1 888 2 14 887
# 131 1018 3 1065 895
# 9 896 2 307 895
# 2 889 2 4 888
# 4 891 2 8 888
# 168 1055 2 971 1018
# 134 1021 2 87 1018
# 39 926 3 241 896
# 10 897 2 64 896
Do be careful - this column is now a factor not a numeric. If you want to recover the numeric values from a factor, you need to convert via character: original_values = as.numeric(as.character(mytable$coltoorder)).
Your data sample suggests that your desired sort order is equivalent to the first appearance in column coltoorder.
If this is true, the function fct_inorder() from Hadley Wickham's forcats package may be particular helpful here:
mytable$coltoorder <- forcats::fct_inorder(as.character(mytable$coltoorder))
mytable[order(mytable$coltoorder), ]
col1 col2 col3 coltoorder
1 895 2 1374 887
2 888 2 14 887
3 1018 3 1065 895
4 896 2 307 895
5 889 2 4 888
6 891 2 8 888
7 1055 2 971 1018
9 1021 2 87 1018
8 926 3 241 896
10 897 2 64 896
fct_inorder() reorders factors levels by first appearance. So, there is no need to create a separate myindex vector.
However, the caveats from Gregor's answer apply as well.