'at' and 'labels' lengths differ, 13 != 10 error in R - r

I have a data frame, df, that looks this this:
a b c d e f g h i j k l m
1a 4 4 3 4 3 4 3 4.0 4.0 4 4 4 3.9
1b 9 9 9 9 9 9 9 8.1 8.8 9 9 9 8.5
1c 8 8 9 8 9 8 8 8.0 9.0 8 9 8 8.3
1d 8 8 8 9 8 9 8 8.0 8.0 8 8 8 8.5
1e 4 4 4 4 4 4 4 4.0 4.0 4 4 4 4.0
2a 3 4 3 4 3 4 3 4.0 3.0 4 3 4 3.8
2b 8 8 8 8 8 8 8 8.0 8.0 8 8 8 8.0
2c 8 8 8 8 8 8 8 9.0 8.0 9 8 8 8.3
2d 8 9 8 8 8 9 8 9.0 8.0 9 8 9 8.0
2e 4 3 4 3 4 4 4 4.0 4.0 4 4 3 3.9
I am using the plotrix and devtools packages, and have already installed them both, and the barp2 function like this:
# install the packages and load the barp2 function
install.packages('plotrix')
install.packages('devtools')
install_url("http://cran.r-project.org/src/contrib/Archive/plotrix/plotrix_3.5-2.tar.gz")
source_gist("https://gist.github.com/tleja/8592929")
# load the packages
library(plotrix)
library(devtools)
The modified code (barp2) that I'm using is available here.
I am trying to plot the data in the data frame, provided above, like this:
par(mar = c(5, 4, 4, 6))
barp2(df, pch=t(c(0:4, 7:14)), names.arg=rownames(df), legend.lab=colnames(df),
ylab="y label", main="main")
I am using the reference chart, to fill the bars of the plot.
I want the rownames of df to be the x axis labels, and the colnames of df to be in a legend.
However, I keep getting this error:
Error in axis(1, at = x, labels = names.arg, cex.axis = cex.axis) :
'at' and 'labels' lengths differ, 13 != 10
I understand that this is because rownames(df) has a length of 10 and colnames(df) has a length of 13 (which are clearly not equal), but I am not sure how to fix this problem, so that the data in the data frame is displayed in a barplot.
Or if I swap the columns and rows around using t(df), like this:
barp2(t(df), pch=t(c(0:4, 7:11)), names.arg=rownames(df), legend.lab=colnames(df),
ylab="y label", main="main")
I get this error:
Error in seq.default(x1[frect] + xinc[frect]/2, x2[frect] - xinc[frect]/2, :
wrong sign in 'by' argument
I have no idea what this error means or why I get it.
Sorry I can't provide an image of what it should look like, but hopefully you get the basic idea of it.
Any help would be much appreciated.

Related

Removing one row of headers

My professor wants us to download an excel file directly from the website, and part of the analysis, we need to generate the sum of some of the columns (the professor suggested that we use starts_with). The point is that the lines with the same name (the second header, if I can call it this way) are the second line, and the RStudio is reading as an observation instead of a proper header. I tried to delete the first row, but the r deleted the header I didn't want. I am going to put the codes here. Initially, I tried this one:
install.packages("tidyverse", dependencies = T)
install.packages("data.table", dependencies = T)
install.packages("readxl", dependencies = T)
install.packages("ggplot2", dependencies = T)
install.packages("openxlsx", dependencies = T)
library(tidyverse)
library(data.table)
library(readxl)
library(ggplot2)
library(openxlsx)
datatable <- data.table(openxlsx::read.xlsx('https://doi.org/10.1371/journal.pone.0242866.s001')) %>%
tail(-1)
Later on, I tried separately (I uploaded the same without the tail(-1)) and in a second line I wrote:
dt <- dt[-1,]
I Also tried something that I saw on the internet with the:
name(dt) = NULL
but it gave me this problem:
Error in View : Internal error: length of names (0) is not length of dt (61)
Can someone tell me the proper way? (In the second and third line I added one object dt = datatable, that is why it is different from the first one)
You just need to use the startRow parameter:
datatable <- data.table::data.table(openxlsx::read.xlsx('https://doi.org/10.1371/journal.pone.0242866.s001', startRow = 2))
Which makes your column names:
> names(datatable)
[1] "Time.mark" "EXP1" "CAL1" "EXP2" "CAL2"
[6] "EXP3" "CAL3" "EXP4" "CAL4" "EXP5"
[11] "CAL5" "EXP6" "CAL6" "EXP7" "CAL7"
[16] "EXP8" "CAL8" "EXP9" "CAL9" "EXP10"
[21] "CAL10" "EXP11" "CAL11" "EXP12" "CAL12"
[26] "EXP13" "CAL13" "EXP14" "CAL14" "EXP15"
[31] "CAL15" "EXP16" "CAL16" "EXP17" "CALIDAD/PRECIO.RTES"
[36] "EXP18" "CAL18" "EXP19" "CAL19" "SAT1"
[41] "SAT2" "SAT3" "SAT4" "SAT5" "SAT6"
[46] "SAT7" "LEALTAD1" "LEALTAD2" "LEALTAD3" "LEALTAD4"
[51] "LEALTAD5" "MEDINA.AZAHARA" "MEZQUITA-CATEDRAL" "ALCAZAR.REYES.CRISTIANOS" "SINAGOGA"
[56] "FIESTA.PATIOS" "IGLESIAS.FERNANDINAS" "BARRIO.JUDERIA" "Sex" "age"
[61] "level.of.study"
Then you can use starts_with() to select columns:
datatable %>%
select(starts_with('EXP'))
EXP1 EXP2 EXP3 EXP4 EXP5 EXP6 EXP7 EXP8 EXP9 EXP10 EXP11 EXP12 EXP13 EXP14 EXP15 EXP16 EXP17 EXP18 EXP19
1: 5 6 5 4 7 5 6 6 5 6 5 5 6 6 6 6 6 6 6
2: 6 7 7 6 6 7 6 7 7 7 5 7 7 6 7 6 6 6 6
3: 5 7 5 6 6 6 5 4 6 7 4 5 6 5 5 5 5 6 4
4: 5 4 6 5 7 6 2 5 6 6 7 7 7 5 5 6 6 5 4
5: 5 4 6 5 7 6 7 5 6 6 7 7 7 5 5 6 6 6 3
---
258: 6 6 6 6 5 6 7 5 7 6 7 6 5 5 7 5 6 6 5
259: 6 6 7 6 7 6 7 7 6 6 7 6 7 7 6 7 6 7 6
260: 6 6 7 6 6 6 6 6 5 7 6 7 6 7 7 6 7 7 7
261: 6 7 6 7 7 7 5 5 6 6 6 7 6 7 6 7 6 6 6
262: 5 7 7 6 6 7 6 5 6 7 5 7 7 5 7 6 7 6 6

Import data and use numeric as header

I'm attempting to replicate the output of the SensoMinR package (cartoconsumer). The example data set "hedo.cocktail" in the package appears to contain a number as a header.
data(cocktail)
View(hedo.cocktail)
However, when I try to import dummy data with the same structure (1, 2, 3,... as a header denoting the number of consumers). The header is automatically added with a "X" by RStudio. The issue is that a dataset containing a "X-header" did not produce the output. Making errors, on the other hand.
Subscript out of bounds in Mat[rownames(MatH),]
My guess is that the issue seems to be that the data header does not match the example data set.
Is it possible to import data with numeric as a header?
Here's a sample of my dummy data. Thank you for your suggestions.
senso.cake
color odor flavor size
25.000 45.000 25.000 78.000
26.000 56.000 49.000 45.000
27.000 54.000 85.000 45.000
28.000 52.000 98.000 58.000
30.000 58.000 56.000 96.000
31.000 56.000 32.000 96.000
32.000 58.000 56.000 93.000
36.000 59.000 45.000 90.000
hedo.cake
1 2 3 4 5 6 7 8 9 10
9 7 7 4 8 9 6 7 8 7
6 4 4 2 4 7 8 7 7 7
7 6 8 7 7 6 7 7 7 6
8 8 6 4 7 8 6 8 6 8
4 5 7 3 8 6 6 8 6 7
7 6 7 3 7 6 6 7 7 8
8 6 8 6 7 7 8 7 4 9
6 3 5 6 4 4 6 7 2 7
You can turn your data frame to a matrix.
mat <- as.matrix(your_df)
mat <- matrix(data = mat, dimnames = NULL, ncol = ncol(your_df))
After which you can simply rbind() whatever header you want to it.
mat <- rbind(c(1:ncol(mat)), mat)
For example:
a <- c(5:8)
b <- c(4:7)
c <- c(3:6)
mat <- matrix(c(a,b,c), ncol = 3)
mat <- rbind(c(1:ncol(mat)), mat)
mat
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 5 4 3
[3,] 6 5 4
[4,] 7 6 5
[5,] 8 7 6

Run function in R for mean

Based on my values i need a function to get following results.
enter image description here
The functions has to calculate the mean of current value and the 3 previous values.
The function should be flexible in that way, that the same calculation can be applied for 2, 4, 5 or x previous values, for example: mean of current value and the 2 previous values.
please consider, that my daten has random numbers, and not like in above example ascending numbers
What you need is a rolling mean, in the argument k (4 in my example) you provide an integer width of the rolling window. Check the documentation page for the rollmean function of the zoo package, ?rollmean.
zoo
library(zoo)
library(dplyr)
df <- data.frame(number = 1:20)
df %>% mutate(rolling_avg = rollmean(number, k = 4 , fill = NA, align = "right"))
RcppRoll
library(RcppRoll)
df %>% mutate(rolling_avg = roll_mean(number, n = 4, fill = NA, align = "right"))
Output
number rolling_avg
1 1 NA
2 2 NA
3 3 NA
4 4 2.5
5 5 3.5
6 6 4.5
7 7 5.5
8 8 6.5
9 9 7.5
10 10 8.5
11 11 9.5
12 12 10.5
13 13 11.5
14 14 12.5
15 15 13.5
16 16 14.5
17 17 15.5
18 18 16.5
19 19 17.5
20 20 18.5
Using the other vector you provided in the comments:
df <- data.frame(number = c(1,-3,5,4,3,2,-4,5,6,-4,3,2,3,-4,5,6,6,3,2))
df %>% mutate(rolling_avg = rollmean(number, 4, fill = NA, align = "right"))
Output
number rolling_avg
1 1 NA
2 -3 NA
3 5 NA
4 4 1.75
5 3 2.25
6 2 3.50
7 -4 1.25
8 5 1.50
9 6 2.25
10 -4 0.75
11 3 2.50
12 2 1.75
13 3 1.00
14 -4 1.00
15 5 1.50
16 6 2.50
17 6 3.25
18 3 5.00
19 2 4.25
You can also use the rollify function in the tibbletime package to create a custom rolling function for any function. For mean it would look like this (using data from #mpalanco's answer):
library(dplyr)
library(tibbletime)
rolling_mean <- rollify(mean, window = 4)
df %>% mutate(moving_average = rolling_mean(number))
which gives you:
number moving_average
1 1 NA
2 2 NA
3 3 NA
4 4 2.5
5 5 3.5
6 6 4.5
7 7 5.5
8 8 6.5
9 9 7.5
10 10 8.5
11 11 9.5
12 12 10.5
13 13 11.5
14 14 12.5
15 15 13.5
16 16 14.5
17 17 15.5
18 18 16.5
19 19 17.5
20 20 18.5
The benefit of this approach is that it is easy to extend to things other than rolling average.

Grouped ranking in R

I have a data with primary key and ratio values like the following
2.243164164
1.429242413
2.119270714
3.013427143
1.208634972
1.208634972
1.23657632
2.212136028
2.168583297
2.151961216
1.159886063
1.234106444
1.694206176
1.401425329
5.210125578
1.215267806
1.089189869
I want to add a rank column which groups these ratios in say 3 bins. Functionality similar to the sas code:
PROC RANK DATA = TAB1 GROUPS = &NUM_BINS
I did the following:
Convert your vector to data frame.
Create variable Rank:
test2$rank<-rank(test2$test)
> test2
test rank
1 2.243164 15.0
2 1.429242 9.0
3 2.119271 11.0
4 3.013427 16.0
5 1.208635 3.5
6 1.208635 3.5
7 1.236576 7.0
8 2.212136 14.0
9 2.168583 13.0
10 2.151961 12.0
11 1.159886 2.0
12 1.234106 6.0
13 1.694206 10.0
14 1.401425 8.0
15 5.210126 17.0
16 1.215268 5.0
17 1.089190 1.0
Define function to convert to percentile ranks and then define pr as that percentile.
percent.rank<-function(x) trunc(rank(x)/length(x)*100)
test3<-within(test2,pr<-percent.rank(rank))
Then I created bins on the fact you wanted 3 of them.
test3$bins <- cut(test3$pr, breaks=c(0,33,66,100), labels=c("0-33","34-66","66-100"))
test x rank pr bins
1 2.243164 15.0 15.0 88 66-100
2 1.429242 9.0 9.0 52 34-66
3 2.119271 11.0 11.0 64 34-66
4 3.013427 16.0 16.0 94 66-100
5 1.208635 3.5 3.5 20 0-33
6 1.208635 3.5 3.5 20 0-33
7 1.236576 7.0 7.0 41 34-66
8 2.212136 14.0 14.0 82 66-100
9 2.168583 13.0 13.0 76 66-100
10 2.151961 12.0 12.0 70 66-100
11 1.159886 2.0 2.0 11 0-33
12 1.234106 6.0 6.0 35 34-66
13 1.694206 10.0 10.0 58 34-66
14 1.401425 8.0 8.0 47 34-66
15 5.210126 17.0 17.0 100 66-100
16 1.215268 5.0 5.0 29 0-33
17 1.089190 1.0 1.0 5 0-33
That work for you?
Almost late but given your data, we can use ntile from dplyr package to get equal sized groups:
df <- data.frame(values = c(2.243164164,
1.429242413,
2.119270714,
3.013427143,
1.208634972,
1.208634972,
1.23657632,
2.212136028,
2.168583297,
2.151961216,
1.159886063,
1.234106444,
1.694206176,
1.401425329,
5.210125578,
1.215267806,
1.089189869))
library(dplyr)
df <- df %>%
arrange(values) %>%
mutate(rank = ntile(values, 3))
values rank
1 1.089190 1
2 1.159886 1
3 1.208635 1
4 1.208635 1
5 1.215268 1
6 1.234106 1
7 1.236576 2
8 1.401425 2
9 1.429242 2
10 1.694206 2
11 2.119271 2
12 2.151961 2
13 2.168583 3
14 2.212136 3
15 2.243164 3
16 3.013427 3
17 5.210126 3
Or see cut_number from ggplot2 package:
library(ggplot2)
df$rank2 <- cut_number(df$values, 3, labels = c(1:3))
values rank rank2
1 1.089190 1 1
2 1.159886 1 1
3 1.208635 1 1
4 1.208635 1 1
5 1.215268 1 1
6 1.234106 1 1
7 1.236576 2 2
8 1.401425 2 2
9 1.429242 2 2
10 1.694206 2 2
11 2.119271 2 2
12 2.151961 2 3
13 2.168583 3 3
14 2.212136 3 3
15 2.243164 3 3
16 3.013427 3 3
17 5.210126 3 3
Because your sample consists of 17 numbers, one bin consists of 5 numbers while the others consist of 6 numbers. There are differences for row 12: ntile assigns 6 numbers to the first and second group, whereas cut_number assigns them to the first and third group.
> table(df$rank)
1 2 3
6 6 5
> table(df$rank2)
1 2 3
6 5 6
See also here: Splitting a continuous variable into equal sized groups

How to change column values in a data frame?

I have a simple data frame column transformation which can be done using an if/else loop, but I was wondering if there was a better way to do this.
The initial data frame is,
df <-data.frame(cbind(x=rep(10:15,3), y=0:8))
df
x y
1 10 0
2 11 1
3 12 2
4 13 3
5 14 4
6 15 5
7 10 6
8 11 7
9 12 8
10 13 0
11 14 1
12 15 2
13 10 3
14 11 4
15 12 5
16 13 6
17 14 7
18 15 8
what I need to do is replace the values in column 'y' such that
'0' gets replaced with '2',
'1' gets replaced with '2.2',
'2' gets replaced with '2.4',
...
...
'6' gets replaced with '3.2'
'7' gets replaced with '3.3'
'8' gets replaced with '10'
so that I end up with something like,
> df
x y
1 10 2.0
2 11 2.2
3 12 2.4
4 13 2.6
5 14 2.8
6 15 3.0
7 10 3.2
8 11 3.3
9 12 10.0
10 13 2.0
11 14 2.2
12 15 2.4
13 10 2.6
14 11 2.8
15 12 3.0
16 13 3.2
17 14 3.3
18 15 10.0
I have searched and found several proposals but couldnt get them to work. One of the attempts was something like,
> levels(factor(df$y)) <- c(2,2.2,2.4,2.6,2.8,3,3.2,3.3,10)
Error in levels(factor(df$y)) <- c(2, 2.2, 2.4, 2.6, 2.8, 3, 3.2, 3.3, :
could not find function "factor<-"
But I get the error message shown above.
Can anyone help me with this?
Use the fact that y+1 is an index for the replacement
something like
replacement <- c(2,2.2,2.4,2.6,2.8,3,3.2,3.3,10)
df <- within(df, z <- replacement[y+1])
Or, using data.table for syntatic sugar and memory efficiency
library(data.table)
DT <- as.data.table(df)
DT[, z := replacement[y+1]]
How about:
mylevels <- c(2,2.2,2.4,2.6,2.8,3,3.2,3.3,10)
df$z <- as.numeric(as.character(factor(df$y,labels=mylevels)))
This also matches your desired outcome:
transform(df,z=ifelse(y==7,3.3,ifelse(y==8,10,2+y/5)))

Resources