how to plot stack bar chart from table dataset - r

> mytable
Var1 Home Work
1 10 523 196
2 11 395 155
3 12 829 330
4 13 680 282
5 24 2897 1210
6 35 4689 1950
7 46 1291 567
8 47 1099 465
9 48 1681 764
10 59 1962 821
I created the stack bar chart in Excel.
How to create the same chart in R?
i tried the ggplot or the based function, but not work

Related

Is there an R function that turns a frequency table into a prop table?

What is the simplest way of turning a frequency data table into a prop table in R?
This is the data:
Time Total Blog News Social.Network Microblog Other Forums Pictures Video
1 15.KW 2022 1816 23 326 39 678 99 27 523 0
2 16.KW 2022 2535 32 690 42 815 135 26 644 1
3 17.KW 2022 2181 20 362 79 805 110 14 634 1
4 18.KW 2022 2583 19 895 25 692 127 6 658 0
5 19.KW 2022 2337 21 555 22 908 148 8 599 0
6 20.KW 2022 2091 23 392 18 851 119 5 554 0
7 21.KW 2022 1658 17 344 16 650 129 1 417 0
8 22.KW 2022 2476 24 798 24 937 150 7 443 0
9 23.KW 2022 1687 14 341 17 691 102 9 400 0
10 24.KW 2022 2476 21 521 29 984 110 19 509 0
11 25.KW 2022 2412 22 696 31 845 115 29 561 0
12 26.KW 2022 2197 22 715 13 709 128 59 445 0
13 27.KW 2022 2111 20 429 10 937 86 28 474 1
14 28.KW 2022 752 5 121 4 373 42 3 172 0
Your data frame df has a 2nd column called Total. It seems that you want to divide subsequent columns by this one.
df[-1] <- df[-1] / df$Total
After this, the 1st column Time does not change. 2nd column Total becomes 1. Other columns become proportions.

Putting several rows into one column in R

I am trying to run a time series analysis on the following data set:
Year 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780
Number 101 82 66 35 31 7 20 92 154 125
Year 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790
Number 85 68 38 23 10 24 83 132 131 118
Year 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800
Number 90 67 60 47 41 21 16 6 4 7
Year 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810
Number 14 34 45 43 48 42 28 10 8 2
Year 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820
Number 0 1 5 12 14 35 46 41 30 24
Year 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830
Number 16 7 4 2 8 17 36 50 62 67
Year 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840
Number 71 48 28 8 13 57 122 138 103 86
Year 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850
Number 63 37 24 11 15 40 62 98 124 96
Year 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860
Number 66 64 54 39 21 7 4 23 55 94
Year 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870
Number 96 77 59 44 47 30 16 7 37 74
My problem is that the data is placed in multiple rows. I am trying to make two columns from the data. One for Year and one for Number, so that it is easily readable in R. I have tried
> library(tidyverse)
> sun.df = data.frame(sunspots)
> Year = filter(sun.df, sunspots == "Year")
to isolate the Year data, and it works, but I am unsure of how to then place it in a column.
Any suggestions?
Try this:
library(tidyverse)
df <- read_csv("test.csv",col_names = FALSE)
df
# A tibble: 6 x 4
# X1 X2 X3 X4
# <chr> <dbl> <dbl> <dbl>
# 1 Year 123 124 125
# 2 Number 1 2 3
# 3 Year 126 127 128
# 4 Number 4 5 6
# 5 Year 129 130 131
# 6 Number 7 8 9
# Removing first column and transpose it to get a dataframe of numbers
df_number <- as.data.frame(as.matrix(t(df[,-1])),row.names = FALSE)
df_number
# V1 V2 V3 V4 V5 V6
# 1 123 1 126 4 129 7
# 2 124 2 127 5 130 8
# 3 125 3 128 6 131 9
# Keep the first two column (V1,V2) and assign column names
df_new <- df_number[1:2]
colnames(df_new) <- c("Year","Number")
# Iterate and rbind with subsequent columns (2 by 2) to df_new
for(i in 1:((ncol(df_number) - 2 )/2)) {
df_mini <- df_number[(i*2+1):(i*2+2)]
colnames(df_mini) <- c("Year","Number")
df_new <- rbind(df_new,df_mini)
}
df_new
# Year Number
# 1 123 1
# 2 124 2
# 3 125 3
# 4 126 4
# 5 127 5
# 6 128 6
# 7 129 7
# 8 130 8
# 9 131 9

Pivot / Reshape data [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
My sample data looks like this:
data <- read.table(header=T, text='
pid measurement1 Tdays1 measurement2 Tdays2 measurement3 Tdays3 measurment4 Tdays4
1 1356 1435 1483 1405 1563 1374 NA NA
2 943 1848 1173 1818 1300 1785 NA NA
3 1590 185 NA NA NA NA 1585 294
4 130 72 443 70 NA NA 136 79
4 140 82 NA NA NA NA 756 89
4 220 126 266 124 NA NA 703 128
4 166 159 213 156 476 145 776 166
4 380 189 583 173 NA NA 586 203
4 353 231 510 222 656 217 526 240
4 180 268 NA NA NA NA NA NA
4 NA NA NA NA NA NA 580 278
4 571 334 596 303 816 289 483 371
')
Now i would like it to look something like this:
PID Time (days) Value
1 1435 1356
1 1405 1483
1 1374 1563
2 1848 943
2 1818 1173
2 1785 1300
3 185 1590
... ... ...
How would i tend to get there? I have looked up some things about wide to longformat, but it doesn't seem to do the trick.
Kind regards, and thank you in advance.
Here is a base R option
u <- cbind(
data[1],
do.call(
rbind,
lapply(
split.default(data[-1], ceiling(seq_along(data[-1]) / 2)),
setNames,
c("Value", "Time")
)
)
)
out <- `row.names<-`(
subset(
x <- u[order(u$pid), ],
complete.cases(x)
), NULL
)
such that
> out
pid Value Time
1 1 1356 1435
2 1 1483 1405
3 1 1563 1374
4 2 943 1848
5 2 1173 1818
6 2 1300 1785
7 3 1590 185
8 3 1585 294
9 4 130 72
10 4 140 82
11 4 220 126
12 4 166 159
13 4 380 189
14 4 353 231
15 4 180 268
16 4 571 334
17 4 443 70
18 4 266 124
19 4 213 156
20 4 583 173
21 4 510 222
22 4 596 303
23 4 476 145
24 4 656 217
25 4 816 289
26 4 136 79
27 4 756 89
28 4 703 128
29 4 776 166
30 4 586 203
31 4 526 240
32 4 580 278
33 4 483 371
An option with pivot_longer
library(dplyr)
library(tidyr)
names(data)[8] <- "measurement4"
data %>%
pivot_longer(cols = -pid, names_to = c('.value', 'grp'),
names_sep = "(?<=[a-z])(?=[0-9])", values_drop_na = TRUE) %>% select(-grp)
# A tibble: 33 x 3
# pid measurement Tdays
# <int> <int> <int>
# 1 1 1356 1435
# 2 1 1483 1405
# 3 1 1563 1374
# 4 2 943 1848
# 5 2 1173 1818
# 6 2 1300 1785
# 7 3 1590 185
# 8 3 1585 294
# 9 4 130 72
#10 4 443 70
# … with 23 more rows

how to make stacked ggplot

want to make stacked ggplot for timeseries
>air2
dayofmonth total dept
1 1 1107 380
2 2 918 92
3 3 1089 113
4 4 1086 235
5 5 1063 218
6 6 1084 325
7 7 1129 180
8 8 1133 166
9 9 918 72
10 10 1088 214
11 11 1114 180
12 12 1047 195
13 13 1110 421
14 14 1165 216
15 15 1174 228
16 16 1010 115
I tried this but didnt get the expected graph:
mdata <- melt(air2,id=c("dayofmonth"))
ggplot(mdata, aes(x=Time,y=value,group=variable,fill=variable)) +
geom_area(position="fill")

Generating Stacked bar plots

I have a dataframe with 3 columns
$x -- at http://pastebin.com/SGrRUJcA
$y -- at http://pastebin.com/fhn7A1rj
$z -- at http://pastebin.com/VmVvdHEE
that I wish to use to generate a stacked barplot. All of these columns hold integer data. The stacked barplot should have the levels along the x-axis and the data for each level along the y-axis. The stacks should then correspond to each of $x, $y and $z.
UPDATE: I now have the following:
counted <- data.frame(table(myDf$x),variable='x')
counted <- rbind(counted,data.frame(table(myDf$y),variable='y'))
counted <- rbind(counted,data.frame(table(myDf$z),variable='z'))
counted <- counted[counted$Var1!=0,] # to get rid of 0th level??
stackedBp <- ggplot(counted,aes(x=Var1,y=Freq,fill=variable))
stackedBp <- stackedBp+geom_bar(stat='identity')+scale_x_discrete('Levels')+scale_y_continuous('Frequency')
stackedBp
which generates:
.
Two issues remain:
the x-axis labeling is not correct. For some reason, it goes: 46, 47, 53, 54, 38, 40.... How can I order it naturally?
I also wish to remove the 0th label.
I've tried using +scale_x_discrete(breaks = 0:50, labels = 1:50) but this doesn't work.
NB. axis labeling issue: Dataframe column appears incorrectly sorted
Not completely sure what you're wanting to see... but reading ?barplot says the first argument, height must be a vector or matrix. So to fix your initial error:
myDf <- data.frame(x=sample(1:10,100,replace=T),y=sample(11:20,100,replace=T),z=1:10)
barplot(as.matrix(myDf))
If you provide a reproducible example and a more specific description of your desired output you can get a better answer.
Or if I were to guess wildly (and use ggplot)...
myDf <- data.frame(x=sample(1:10,100,replace=T),y=sample(11:20,100,replace=T),z=1:10)
myDf.counted<- data.frame(table(myDf$x),variable='x')
myDf.counted <- rbind(myDf.counted,data.frame(table(myDf$y),variable='y'))
myDf.counted <- rbind(myDf.counted,data.frame(table(myDf$z),variable='z'))
ggplot(myDf.counted,aes(x=Var1,y=Freq,fill=variable))+geom_bar(stat='identity')
I'm surprised that didn't blow up in your face. Cross-classifying the joint occurrence of three different vectors each of length 35204 would often consume many gigabytes of RAM (and would possibly create lots of useless 0's as you found). Maybe you wanted to examine instead the results of sapply(myDf, table)? This then creates three separate tables of counts.
It's a rather irregular result and would need further work to get it into a matrix form but you might want to consider using densityplot to display the comparative distributions which I think is your goal.
$x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
126 711 1059 2079 3070 2716 2745 3329 2916 2671 2349 2457 2055 1303 892 692
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
559 799 482 299 289 236 156 145 100 95 121 133 60 34 37 13
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
15 12 56 10 4 7 2 14 13 28 30 20 16 62 74 58
49 50
40 15
$y
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
3069 32 1422 1376 1780 1556 1937 1844 1967 1699 1910 1924 1047 894 975 865
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
635 1002 710 908 979 848 678 908 696 491 417 412 499 411 421 217
32 33 34 35 36 37 39 42 46 47 53 54
265 182 121 47 38 11 2 2 1 1 1 4
$z
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
31 202 368 655 825 1246 900 1136 1098 1570 1613 1144 1107 1037 1239 1372
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
1306 1085 843 867 813 1057 1213 1020 1210 939 725 644 617 602 739 584
32 33 34 35 36 37 38 39 40 41 42 43
650 733 756 681 684 657 544 416 220 48 7 1
The density plot is really simple to create in lattice:
densityplot( ~x+y+z, myDf)

Resources