I'm not that good in programming R, maybe somebody can help me.
I used pretty() function on some histograms (for breaks=) but sometimes got an error. Further testing brought me to following results:
pretty(-1:1,n=20)
[1] -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
That's what I expected, also with the following line of code
pretty(-26:71,n=18)
[1] -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75
BUT - here comes the problem, following pretty() usage brought a complete wrong result (not even getting above zero)
pretty(-0.26:0.71,n=10)
[1] -0.35 -0.30 -0.25 -0.20
Can anybody explain to me what I'm doing wrong ? Thanks for your help !
The description of pretty from help(pretty) does a good job explaining the purpose:
Compute a sequence of about n+1 equally spaced ‘round’ values which cover the range of the values in x. The values are chosen so that they are 1, 2 or 5 times a power of 10.
Your first call included the sequence -1:1:
-1:1
[1] -1 0 1
pretty(-1:1,n=20)
[1] -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2
[14] 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
You got some nice values that cover -1 to 1. The same is true for your second call.
However, your third call included the sequence -0.26:0.71. The following are equivallent:
-0.26:0.71
[1] -0.26
seq(-0.26,0.71,by = 1)
[1] -0.26
Since -0.26 and 0.71 are closer together than 1, you only get a single value returned.
Therefore, pretty(-0.26,n=10) makes a nice sequence around -0.26:
pretty(-0.26,n=10)
[1] -0.35 -0.30 -0.25 -0.20
Perhaps instead you'd prefer this:
pretty(c(-0.26,0.71),n=10)
[1] -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
The code underlying pretty is somewhat complicated. But you can see the function by typing pretty.default at the console and reviewing the C source here.
Another way of dealing with it is to multiply by 10 and then divide by 10 again.
pretty((10*-0.26):(10*0.71), n=10)/10
[1] -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Related
I have a dataframe (df) with average Current Intensities for two different sites (A and B) and for different depths (5 meters C.I.5m, 12 meters C.I.12m, 20 meters C.I.20m, 28 meters C.I.28m and 35 meters C.I.35m) over time. Here I show an example:
df<- data.frame(Datetime=c("2018-08-06 00:00:00","2018-08-06 00:00:00","2018-08-06 03:00:00","2018-08-06 03:00:00","2018-08-06 06:00:00","2018-08-06 06:00:00"),
Site=c("A","B","A","B","A","B"),
C.I.5m=c(0.1,0.3,0.8,0.2,0.4,0.2),
C.I.12m=c(0.2,0.1,0.6,0.3,0.2,0.4),
C.I.20m=c(0.1,0.3,0.7,0.4,0.4,0.2),
C.I.28m=c(0.2,0.3,0.4,0.1,0.1,0.2),
C.I.35m=c(0.3,0.5,0.2,0.3,0.4,0.1))
df
Datetime Site C.I.5m C.I.12m C.I.20m C.I.28m C.I.35m
1 2018-08-06 00:00:00 A 0.1 0.2 0.1 0.2 0.3
2 2018-08-06 00:00:00 B 0.3 0.1 0.3 0.3 0.5
3 2018-08-06 03:00:00 A 0.8 0.6 0.7 0.4 0.2
4 2018-08-06 03:00:00 B 0.2 0.3 0.4 0.1 0.3
5 2018-08-06 06:00:00 A 0.4 0.2 0.4 0.1 0.4
6 2018-08-06 06:00:00 B 0.2 0.4 0.2 0.2 0.1
I want to calculate how much differ the current intensity among depths (that is, among columns in my dataframe) with different variables. The first variable I call it MCICC(Maximum Current Intensity Change in Column) and is the maximum difference among the values from the different columns related to the current intensity (C.I.5m,C.I.12m,C.I.20m,C.I.28m and C.I.35m). Then, another variable called MCIC10m that summarizes the difference between C.I.5m and C.I.12m. Then another one called MCIC20m that summarizes the difference between C.I.12m, C.I.20m and C.I.28m. Finally, a variable called MCIC30m that summarizes the difference between C.I.28m and C.I.35m.
I would expect this:
> df
Datetime Site C.I.5m C.I.12m C.I.20m C.I.28m C.I.35m MWCICC MWCIC10 MWCIC20 MWCIC30
1 2018-08-06 00:00:00 A 0.1 0.2 0.1 0.2 0.3 0.2 0.1 0.1 0.1
2 2018-08-06 00:00:00 B 0.3 0.1 0.3 0.3 0.5 0.4 0.2 0.2 0.2
3 2018-08-06 03:00:00 A 0.8 0.6 0.7 0.4 0.2 0.6 0.2 0.3 0.2
4 2018-08-06 03:00:00 B 0.2 0.3 0.4 0.1 0.3 0.3 0.1 0.3 0.2
5 2018-08-06 06:00:00 A 0.4 0.2 0.4 0.1 0.4 0.3 0.2 0.3 0.3
6 2018-08-06 06:00:00 B 0.2 0.4 0.2 0.2 0.1 0.3 0.2 0.2 0.1
The tricky point is that each new variable is calculated from a different number of primary columns. MCICC takes the 5 depths into account (five columns), MCIC10 takes 5 and 12 meters depth into account (two columns), MCIC20 takes 12, 20 and 28 meters depth into account (two columns) and MCIC30 takes 28 and 35 meters depth into account (three columns).
Does anyone know how to calculate all at once?
We can use combn to calculate pairwise differences between different columns
f1 <- function(data) {
do.call(pmax, as.data.frame(abs(combn(data, 2,
FUN = function(x) x[, 1]- x[,2]))))
}
MWCICC <- f1(df[-c(1:2)])
MCIC10m <- f1(df[c("C.I.5m", "C.I.12m")])
MCIC20m <- f1(df[c("C.I.12m", "C.I.20m", "C.I.28m")]
MCIC30m <- f1(df[c("C.I.28m", "C.I.35m")]
df[c("MWCICC", "MCIC10m", "MCIC20m", "MCIC30m")] <- cbind(MWCICC,
MCIC10m, MCIC20m, MCIC30m)
I have searched all the lapply questions and solutions, and none of those solutions seems to address and/or work for the following...
I have a list "temp" that contains the names of 100 data frames: "sim_rep1.dat" through "sim_rep100.dat".
Each data frame has 2000 observations and the same 11 variables: ARAND and w1-w10, all of which are numeric.
For all 100 data frames, I am trying to create a new variable called "ps_true" that incorporates certain of the "w" variables, each with a unique coefficient.
The only use of lapply that is working for me is the following:
lapply(mget(paste0("sim_rep", 1:100,".dat")), transform,
ps_true = (1 + exp(-(0.8*w1 - 0.25*w2 + 0.6*w3 -
0.4*w4 - 0.8*w5 - 0.5*w6 + 0.7*w7)))^-1)
When I run the code above, R loops through all 100 data frames and shows newly calculated values for ps_true in the console. Unfortunately, the new column is not getting added to the data frames.
When I try to create a function, the wheels come completely off.
I have tried different variations of the following:
lapply(temp, function(x){
ps_true = (1 + exp(-(0.8*w1 - 0.25*w2 + 0.6*w3 -
0.4*w4 - 0.8*w5 - 0.5*w6 + 0.7*w7)))^-1
cbind(x, ps_true)
return(x)
})
Error in FUN(X[[i]], ...) : object 'w1' not found results from the function shown above
Error in x$w1 : $ operator is invalid for atomic vectors results if I try to reference x$w1 instead
Error in FUN(X[[i]], ...) : object 'w1' not found results if I try to reference x[[w1]] instead
Error in x[["w1"]] : subscript out of bounds results if I try to reference x[["w1"]] instead
I am hoping there is something obvious that I am missing. I'd appreciate your insights and suggestions to solve this frustrating problem.
In response to Uwe's addendum:
The code I had used to read all the files was the following:
temp = list.files(pattern='*.dat')
for (i in 1:length(temp)) {
assign(temp[i], read.csv(temp[i], header=F,sep="",
col.names = c("ARAND", "w1", "w2", "w3", "w4", "w5", "w6", "w7", "w8", "w9", "w10")))
}
According to the OP, there are 100 data.frames with identical columns names. The OP wants to create a new column in all of the data.frames using exactly the same formula.
This indicates a fundamental flaw in the design of the data structure. I guess, no data base admin would create 100 identical tables where only the data contents differs. Instead, he would create one table with an additional column identifying the origin of each row. Then, all subsequent operations would be applied on one table instead to be repeated for each of many.
In R, the data.table package has the convenient rbindlist() function which can be used for this purpose:
library(data.table) # CRAN version 1.10.4 used
# get list of data.frames from the given names and
# combine the rows of all data sets into one large data.table
DT <- rbindlist(mget(temp), idcol = "origin")
# now create new column for all rows across all data sets
DT[, ps_true := (1 + exp(-(0.8*w1 - 0.25*w2 + 0.6*w3 -
0.4*w4 - 0.8*w5 - 0.5*w6 + 0.7*w7)))^-1]
DT
origin ARAND w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 ps_true
1: sim_rep1.dat -0.6 -0.5 0.2 -0.7 0.5 2.4 -0.2 -0.9 -1.1 0.3 -0.8 0.0287485
2: sim_rep1.dat -0.2 0.2 0.7 1.0 1.8 -0.2 0.8 0.3 -1.3 -1.6 -0.2 0.4588433
3: sim_rep1.dat 1.6 -0.5 0.7 -0.7 -1.7 0.9 -1.2 -1.0 1.1 -0.3 -2.1 0.2432395
4: sim_rep1.dat 0.1 1.2 -1.3 -0.1 0.3 -0.6 0.4 0.3 0.8 -1.2 -1.7 0.8313184
5: sim_rep1.dat 0.1 0.2 -2.0 0.6 -0.3 0.2 0.2 0.5 -0.9 -0.8 -1.1 0.7738186
---
199996: sim_rep100.dat 0.1 -1.4 1.6 -0.7 -1.0 -0.6 0.8 -0.6 -0.5 -0.4 -0.8 0.1323889
199997: sim_rep100.dat 0.3 1.3 -2.4 -0.7 -0.4 0.0 1.0 -0.2 1.0 -0.1 0.3 0.6769959
199998: sim_rep100.dat 0.3 1.2 0.0 -1.3 -0.8 -0.7 -0.3 0.1 0.9 0.9 -1.3 0.7824498
199999: sim_rep100.dat 0.5 -0.7 0.2 0.5 1.1 -0.3 0.3 -0.5 -0.8 1.9 -0.7 0.2669799
200000: sim_rep100.dat -0.5 1.1 0.8 0.2 -0.6 -0.5 -0.4 1.1 -1.8 0.9 -1.3 0.9175867
DT consists now of 200 K rows. Performance is no reason to worry as data.tablewas built to deal with large (even larger) data efficiently.
The origin of each row can be identified in case the data of the individual data sets need to be treated separately. E.g.,
DT[origin == "sim_rep47.dat"]
origin ARAND w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 ps_true
1: sim_rep47.dat -0.6 -0.5 0.2 -0.7 0.5 2.4 -0.2 -0.9 -1.1 0.3 -0.8 0.0287485
2: sim_rep47.dat -0.2 0.2 0.7 1.0 1.8 -0.2 0.8 0.3 -1.3 -1.6 -0.2 0.4588433
3: sim_rep47.dat 1.6 -0.5 0.7 -0.7 -1.7 0.9 -1.2 -1.0 1.1 -0.3 -2.1 0.2432395
4: sim_rep47.dat 0.1 1.2 -1.3 -0.1 0.3 -0.6 0.4 0.3 0.8 -1.2 -1.7 0.8313184
5: sim_rep47.dat 0.1 0.2 -2.0 0.6 -0.3 0.2 0.2 0.5 -0.9 -0.8 -1.1 0.7738186
---
1996: sim_rep47.dat 0.1 -1.4 1.6 -0.7 -1.0 -0.6 0.8 -0.6 -0.5 -0.4 -0.8 0.1323889
1997: sim_rep47.dat 0.3 1.3 -2.4 -0.7 -0.4 0.0 1.0 -0.2 1.0 -0.1 0.3 0.6769959
1998: sim_rep47.dat 0.3 1.2 0.0 -1.3 -0.8 -0.7 -0.3 0.1 0.9 0.9 -1.3 0.7824498
1999: sim_rep47.dat 0.5 -0.7 0.2 0.5 1.1 -0.3 0.3 -0.5 -0.8 1.9 -0.7 0.2669799
2000: sim_rep47.dat -0.5 1.1 0.8 0.2 -0.6 -0.5 -0.4 1.1 -1.8 0.9 -1.3 0.9175867
extracts all row belonging to data set sim_rep47.dat.
Data
For test and demonstration, I've created 100 sample data.frames using the code below:
# create vector of file names
temp <- paste0("sim_rep", 1:100, ".dat")
# create one sample data.frame
nr <- 2000L
nc <- 11L
set.seed(123L)
foo <- as.data.frame(matrix(round(rnorm(nr * nc), 1), nrow = nr))
names(foo) <- c("ARAND", paste0("w", 1:10))
str(foo)
# create 100 individually named data.frames by "copying" foo
for (t in temp) assign(t, foo)
# print warning message on using assign
fortunes::fortune(236)
# verify objects have been created
ls()
Addendum: Reading all files at once
The OP has named the single data.frames sim_rep1.dat, sim_rep2.dat, etc. which resemble typical file names. Just in case the OP indeed has 100 files on disk I would like to suggest a way to read all files at once. Let's suppose all files are stored in one directory.
# path to data directory
data_dir <- file.path("path", "to", "data", "directory")
# create vector of file paths
files <- dir(data_dir, pattern = "sim_rep\\d+\\.dat", full.names = TRUE)
# read all files and create one large data.table
# NB: it might be necessary to add parameters to fread()
# or to use another file reader depending on the file type
DT <- rbindlist(lapply(files, fread), idcol = "origin")
# rename origin to contain the file names without path
DT[, origin := factor(origin, labels = basename(files))]
DT
origin ARAND w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 ps_true
1: sim_rep1.dat -0.6 -0.5 0.2 -0.7 0.5 2.4 -0.2 -0.9 -1.1 0.3 -0.8 0.0287485
2: sim_rep1.dat -0.2 0.2 0.7 1.0 1.8 -0.2 0.8 0.3 -1.3 -1.6 -0.2 0.4588433
3: sim_rep1.dat 1.6 -0.5 0.7 -0.7 -1.7 0.9 -1.2 -1.0 1.1 -0.3 -2.1 0.2432395
4: sim_rep1.dat 0.1 1.2 -1.3 -0.1 0.3 -0.6 0.4 0.3 0.8 -1.2 -1.7 0.8313184
5: sim_rep1.dat 0.1 0.2 -2.0 0.6 -0.3 0.2 0.2 0.5 -0.9 -0.8 -1.1 0.7738186
---
199996: sim_rep99.dat 0.1 -1.4 1.6 -0.7 -1.0 -0.6 0.8 -0.6 -0.5 -0.4 -0.8 0.1323889
199997: sim_rep99.dat 0.3 1.3 -2.4 -0.7 -0.4 0.0 1.0 -0.2 1.0 -0.1 0.3 0.6769959
199998: sim_rep99.dat 0.3 1.2 0.0 -1.3 -0.8 -0.7 -0.3 0.1 0.9 0.9 -1.3 0.7824498
199999: sim_rep99.dat 0.5 -0.7 0.2 0.5 1.1 -0.3 0.3 -0.5 -0.8 1.9 -0.7 0.2669799
200000: sim_rep99.dat -0.5 1.1 0.8 0.2 -0.6 -0.5 -0.4 1.1 -1.8 0.9 -1.3 0.9175867
All data sets are now stored in one large data.table DT consisting of 200 k rows. However, the order of data sets is different as files is sorted alphabetically, i.e.,
head(files)
[1] "./data/sim_rep1.dat" "./data/sim_rep10.dat" "./data/sim_rep100.dat"
[4] "./data/sim_rep11.dat" "./data/sim_rep12.dat" "./data/sim_rep13.dat"
probably just need single brackets.
test = data.frame('w1' = c(1,2,3),'w2' = c(2,3,4))
temp = list(test,test,test)
temp2 = lapply(temp,function(x){cbind(x,setNames(x['w1'] + x['w2'],'ps_true'))})
temp2
[[1]]
w1 w2 ps_true
1 1 2 3
2 2 3 5
3 3 4 7
[[2]]
w1 w2 ps_true
1 1 2 3
2 2 3 5
3 3 4 7
[[3]]
w1 w2 ps_true
1 1 2 3
2 2 3 5
3 3 4 7
I have a data table similar to this except much larger:
set.seed(1)
dt <- data.table(t1=round(rnorm(5),1), t2=round(rnorm(5),1), t3=round(rnorm(5),1),
t4=round(rnorm(5),1), t5=round(rnorm(5),1), t6=round(rnorm(5),1),
t7=round(rnorm(5),1),t8=round(rnorm(5),1))
Which outputs:
t1 t2 t3 t4 t5 t6 t7 t8
1: -0.6 -0.8 1.5 0.0 0.9 -0.1 1.4 -0.4
2: 0.2 0.5 0.4 0.0 0.8 -0.2 -0.1 -0.4
3: -0.8 0.7 -0.6 0.9 0.1 -1.5 0.4 -0.1
4: 1.6 0.6 -2.2 0.8 -2.0 -0.5 -0.1 1.1
5: 0.3 -0.3 1.1 0.6 0.6 0.4 -1.4 0.8
I would like to rename columns t3:t8 as hour_t3:hour_t8, to output like this:
t1 t2 hour_t3 hour_t4 hour_t5 hour_t6 hour_t7 hour_t8
1: -0.6 -0.8 1.5 0.0 0.9 -0.1 1.4 -0.4
2: 0.2 0.5 0.4 0.0 0.8 -0.2 -0.1 -0.4
3: -0.8 0.7 -0.6 0.9 0.1 -1.5 0.4 -0.1
4: 1.6 0.6 -2.2 0.8 -2.0 -0.5 -0.1 1.1
5: 0.3 -0.3 1.1 0.6 0.6 0.4 -1.4 0.8
These two methods work:
names(dt)[3:8] <- c(paste0("hour_t", 3:8))
and
setnames(dt, 3:8, c(paste0("hour_t", 3:8)))
but, I would like to be able to subset by reference using something like this:
setnames(dt, "t3":"t8", c(paste0("hour_t", 3:8)))
When I use such syntax or subset with c("t3":"t8"), I get the following error:
Error in "t3":"t8" : NA/NaN argument
In addition: Warning messages:
1: In setnames(dt, c("t3":"t8"), c(paste0("hour_t", 3:8))) :
NAs introduced by coercion
2: In setnames(dt, c("t3":"t8"), c(paste0("hour_t", 3:8))) :
NAs introduced by coercion
Any thoughts on how to subset the columns to rename by reference/column name instead of by position would be greatly appreciated. Thanks.
I am still quite new to data.table and am using data.table version 1.9.6.
I have matrix like
> brdrs <- matrix(c(-1,-0.2,0.2,3),ncol=2,byrow=TRUE)
> brdrs
[,1] [,2]
[1,] -1.0 -0.2
[2,] 0.2 3.0
I want to make sequence, based on this matrix. First column is the start of interval, second- the end. Each row is interval of one sequence.
For example it would be: from -1.0 to -0.2 AND from 0.2 to 3.0 by 0.1.
Is it possible without loops?
Thanks
You could use this :
unlist(sapply(1:nrow(brdrs),function(x){seq(brdrs[x,1],brdrs[x,2],0.1)}))
[1] -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
[18] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6
[35] 2.7 2.8 2.9 3.0
Like Ananda Mahto said in the comment, you should specify the increment of the sequence. Here, I used 0.1
Below is the data I am working with. I do a simple hist(data) and the frequency of -.3 through .4 are correct. However, for some reason R seems to combine the frequency of -.5 and -.4, the two left most bins. There are 3 counts of -.5 and 5 counts of -.4, but R plots 8 counts of both -.5 and -.4.
Any idea why this may be going on? How to fix it?
[1] -0.1 0.0 0.1 0.1 0.3 0.0 0.0 0.1 0.1 0.1 0.2 0.1 -0.1 0.2 0.0
[16] -0.4 0.2 0.0 -0.1 0.0 0.1 0.1 -0.1 0.0 0.0 0.1 0.0 -0.1 0.0 0.3
[31] -0.2 0.4 -0.1 0.0 -0.2 0.0 0.1 0.1 0.0 0.1 0.2 -0.1 0.1 0.1 -0.1
[46] 0.2 0.1 -0.1 0.1 0.0 -0.1 0.4 -0.1 -0.1 0.0 0.0 -0.1 0.1 0.1 0.0
[61] 0.1 -0.1 0.2 -0.1 0.1 -0.1 0.0 0.1 0.0 0.1 0.0 0.1 0.0 -0.1 0.1
[76] 0.2 -0.2 0.0 0.0 -0.1 0.2 0.0 0.0 0.0 -0.3 0.0 -0.1 -0.1 0.1 -0.2
[91] -0.1 -0.3 -0.1 -0.3 -0.2 -0.2 0.0 0.0 0.0 -0.2 0.1 0.0 0.0 0.1 0.0
[106] 0.0 -0.2 -0.1 0.2 -0.1 0.0 -0.1 -0.1 -0.2 0.1 0.1 0.0 0.1 0.2 0.1
[121] 0.0 0.1 -0.2 0.2 0.0 0.0 0.1 0.1 0.0 -0.1 0.1 0.0 0.1 -0.1 0.2
[136] 0.0 0.1 0.1 0.0 0.1 -0.1 0.0 0.0 0.1 0.2 -0.1 0.1 0.0 0.1 0.0
[151] -0.1 0.0 0.2 0.1 -0.1 0.1 -0.2 0.1 0.1 -0.1 0.1 -0.2 -0.1 0.1 -0.1
[166] 0.0 0.0 -0.3 0.0 0.1 -0.2 0.1 -0.4 -0.2 -0.2 -0.3 0.0 -0.4 -0.3 -0.5
[181] -0.5 -0.5 -0.4 -0.3 -0.4 -0.1 0.0 -0.1 -0.2 -0.2 0.1 0.0 0.2 -0.1 -0.1
[196] 0.0 0.3 0.2 -0.1 0.0 0.0 0.0 -0.3 0.4 0.3 0.1 0.0 -0.1 0.1 -0.1
[211] 0.1 0.0 0.0 0.2 0.2 0.1 0.3 -0.1 0.1 0.0 0.0 0.0 0.0 0.1 0.3
[226] 0.0 0.0 -0.1 0.0 0.2 0.2 0.0 0.0 0.0 0.2 0.1 0.0 0.0 0.2 0.3
[241] 0.1 -0.1 0.0 0.4 0.0 0.2 -0.1 0.1
Here is the output of the histogram. You can see 8 counts of -.5 and -.4, which isn't in the data
$breaks
[1] -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4
$counts
[1] 8 8 17 46 75 60 23 7 4
The comments above explain what's happening - the breaks are the left and right limits of the intervals, not the centers.
How to fix it? If you are dealing just with numbers discretized to [natural numbers] * 0.1 you can set your breaks at 0.05, 0.15, ... by
data <- c(-0.5, -0.5, 0.4)
breaks <- ((min(data)*10):(max(data)*10+1))/10-0.05
result <- hist(data, breaks)
But is a histogram really that what you need for this? It seems that you just want to calculate the number of occurrences which is much easier by
data <- c(-0.5, -0.5, 0.4)
aggregate(data, list(data), "length")
returning
Group.1 x
1 -0.4 2
2 0.6 1
And for plotting, have a look at barplot