Grouping valuable with barplot in R studio - r

Income1.csv
Age.Group X X.1 X.2 X.3 X.4
1 Income 16-24 25-34 35-44 45-54 55+
2 Low 13.9 17.4 14.9 11.9 10.9
3 Medium 26.3 46.9 42.2 30.7 21.5
4 High 11.6 19.7 22.4 17.4 6.7
How do you create a grouped barplot with the height as Age? The picture below is what I want to create.

Read your data:
d <- dput(d)
structure(list(Income = structure(c(2L, 3L, 1L), .Label = c("High",
"Low", "Medium"), class = "factor"), `16-24` = c(13.9, 26.3,
11.6), `25-34` = c(17.4, 46.9, 19.7), `35-44` = c(14.9, 42.2,
22.4), `45-54` = c(11.9, 30.7, 17.4), `55+` = c(10.9, 21.5, 6.7
)), .Names = c("Income", "16-24", "25-34", "35-44", "45-54",
"55+"), class = "data.frame", row.names = c(NA, -3L))
Plot your data: beside specifies that the values are plotted beside not stacked.
barplot(as.matrix(d[,-1]), beside = T, legend.text = d$Income)

Related

Converting values to special character in summary

I have created a summary table like below
Name Sales
AS 71.5%
DY 88.4%
VH 44.6%
MY 86.9%
HU 42.3%
TT 67.2%
BG 0.0%
SA 85.3%
now I want to replace the occurrence of 0.0 to "-"
I have tried
tab[,2] <- paste0(tab[,2],"%")
tab[,2] <- replace(tab[,2],tab[,2]<0,"-")
but its converting all values like 8.0 and 7.0 to "-"
do we have any other sollution
the output should be like
Name Sales
AS 71.5%
DY 88.4%
BG -
so the whole function is like this, have three columns of os sales for each person
You can try this:
#Data
df <- structure(list(Name = structure(c(1L, 3L, 8L, 5L, 4L, 7L, 2L,
6L), .Label = c("AS", "BG", "DY", "HU", "MY", "SA", "TT", "VH"
), class = "factor"), Sales = c(71.5, 88.4, 44.6, 86.9, 42.3,
67.2, 0, 85.3)), class = "data.frame", row.names = c(NA, -8L))
#Code
index <- which(df$Sales==0)
df$Sales[index] <- '-'
Name Sales
1 AS 71.5
2 DY 88.4
3 VH 44.6
4 MY 86.9
5 HU 42.3
6 TT 67.2
7 BG -
8 SA 85.3
Update with new data
New data has been provided:
df2 <- structure(list(Name = c("AS", "DY", "VH", "MY", "HU", "TT", "BG",
"SA"), Sales = c("71.5%", "88.4%", "44.6%", "86.9%", "42.3%",
"67.2%", "0.0%", "85.3%")), class = "data.frame", row.names = c(NA,
-8L))
df2$Sales2 <- gsub("0.0%","-",df2$Sales,fixed=T)
Name Sales Sales2
1 AS 71.5% 71.5%
2 DY 88.4% 88.4%
3 VH 44.6% 44.6%
4 MY 86.9% 86.9%
5 HU 42.3% 42.3%
6 TT 67.2% 67.2%
7 BG 0.0% -
8 SA 85.3% 85.3%
Update with variable
Using first data df:
df$tab <- paste0(df$Sales,'%')
df$tab <- ifelse(nchar(df$tab)==2,gsub("0%","-",df$tab,fixed=T),df$tab)
Name Sales tab
1 AS 71.5 71.5%
2 DY 88.4 88.4%
3 VH 44.6 44.6%
4 MY 86.9 86.9%
5 HU 42.3 42.3%
6 TT 67.2 67.2%
7 BG 0.0 -
8 SA 85.3 85.3%
Try this:
tab$Sales <- replace(tab$Sales, which(tab$Sales == 0), "-")
I'd also recommend looking into dplyr's mutate.

join files based on one column and different orders in R or python

I have 3 tab separated file like the following 3 examples:
example files:
AB 45.2 4.56 0.21
FG 78.1 54.1 36.1
HG 98.1 25.0 12.6
TR 1.2 3.25 65.1
TR 5.2 41.6 10.21
HG 8.1 23.1 56.1
FG 9 32.0 32.6
AB 12.2 31.25 5.1
HG 15.2 21.6 20.21
TR 31.1 32.1 66.1
AB 12.1 12.0 62.6
FG 11.3 31.25 54.1
the 1st column in all of them has similar items but in different order. I want to join the files based on the 1st column and make a new file like the expected output:
expected output
AB 45.2 4.56 0.21 12.2 31.25 5.1 12.1 12.0 62.6
FG 78.1 54.1 36.1 9 32.0 32.6 11.3 31.25 54.1
HG 98.1 25.0 12.6 8.1 23.1 56.1 15.2 21.6 20.21
TR 1.2 3.25 65.1 5.2 41.6 10.21 31.1 32.1 66.1
I tried to use join function in R and pandas but they do not return the expected output. do you know how I can do that in python or R?
In R, you could use Reduce and merge
Reduce(function(x, y) merge(x, y, by = 'V1'), list(df1, df2, df3))
#If there are lot of dataframes use `mget` and `ls`
#Reduce(function(x, y) merge(x, y, by = 'V1'), mget(ls(pattern = "df\\d+")))
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
#1 AB 45.2 4.56 0.21 12.2 31.2 5.1 12.1 12.0 62.6
#2 FG 78.1 54.10 36.10 9.0 32.0 32.6 11.3 31.2 54.1
#3 HG 98.1 25.00 12.60 8.1 23.1 56.1 15.2 21.6 20.2
#4 TR 1.2 3.25 65.10 5.2 41.6 10.2 31.1 32.1 66.1
data
where one column in the data is common in all dataframes and rest of them have different names.
df1 <- structure(list(V1 = structure(1:4, .Label = c("AB", "FG", "HG",
"TR"), class = "factor"), V2 = c(45.2, 78.1, 98.1, 1.2), V3 = c(4.56,
54.1, 25, 3.25), V4 = c(0.21, 36.1, 12.6, 65.1)),
class = "data.frame", row.names = c(NA, -4L))
df2 <- structure(list(V1 = structure(4:1, .Label = c("AB", "FG", "HG",
"TR"), class = "factor"), V5 = c(5.2, 8.1, 9, 12.2), V6 = c(41.6,
23.1, 32, 31.25), V7 = c(10.21, 56.1, 32.6, 5.1)), class = "data.frame",
row.names = c(NA, -4L))
df3 <- structure(list(V1 = structure(c(3L, 4L, 1L, 2L), .Label = c("AB",
"FG", "HG", "TR"), class = "factor"), V8 = c(15.2, 31.1, 12.1,
11.3), V9 = c(21.6, 32.1, 12, 31.25), V10 = c(20.21, 66.1, 62.6,
54.1)), class = "data.frame", row.names = c(NA, -4L))
We can use reduce with inner_join in tidyverse (in R)
library(dplyr)
library(purrr)
mget(paste0('df', 1:3)) %>%
reduce(inner_join)
data
df1 <- structure(list(V1 = structure(1:4, .Label = c("AB", "FG", "HG",
"TR"), class = "factor"), V2 = c(45.2, 78.1, 98.1, 1.2), V3 = c(4.56,
54.1, 25, 3.25), V4 = c(0.21, 36.1, 12.6, 65.1)),
class = "data.frame", row.names = c(NA, -4L))
df2 <- structure(list(V1 = structure(4:1, .Label = c("AB", "FG", "HG",
"TR"), class = "factor"), V5 = c(5.2, 8.1, 9, 12.2), V6 = c(41.6,
23.1, 32, 31.25), V7 = c(10.21, 56.1, 32.6, 5.1)), class = "data.frame",
row.names = c(NA, -4L))
df3 <- structure(list(V1 = structure(c(3L, 4L, 1L, 2L), .Label = c("AB",
"FG", "HG", "TR"), class = "factor"), V8 = c(15.2, 31.1, 12.1,
11.3), V9 = c(21.6, 32.1, 12, 31.25), V10 = c(20.21, 66.1, 62.6,
54.1)), class = "data.frame", row.names = c(NA, -4L))

Connecting two sets of coordinates to create lines using sf/mapview

I have a dataset where a bird captured in one location (Blong, Blat) then encountered again in another (Elong, Elat). These coordinates are in a lat/long format, and I'd like to connect the capture and encounter locations with a line and plot them in mapview.
In the data below, each row is an individual bird with its capture/encounter coordinates, and the flyway that it belongs to (which I would like to use to color the lines in mapview.
dat <- structure(list(Blong = c(-75.58333, -76.08333, -81.08333, -94.25,
-75.41667, -99.41667, -77.41667, -116.08333, -89.58333, -77.58333
), Blat = c(37.58333, 40.58333, 42.75, 41.91667, 38.25, 28.25,
38.91667, 43.58333, 44.25, 38.91667), Elong = c(-65.91667, -75.75,
-80.58333, -95.41667, -73.58333, -89.41667, -77.58333, -116.41667,
-96.41667, -77.41667), Elat = c(45.91667, 40.58333, 42.75, 29.75,
45.58333, 48.25, 38.75, 43.58333, 34.08333, 38.91667), Flyway = structure(c(2L,
2L, 2L, 1L, 2L, 2L, 2L, 3L, 2L, 2L), .Label = c("Central", "Eastern",
"West"), class = "factor")), .Names = c("Blong", "Blat", "Elong",
"Elat", "Flyway"), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
A look at the data:
# A tibble: 10 x 5
Blong Blat Elong Elat Flyway
<dbl> <dbl> <dbl> <dbl> <fct>
1 -75.6 37.6 -65.9 45.9 Eastern
2 -76.1 40.6 -75.8 40.6 Eastern
3 -81.1 42.8 -80.6 42.8 Eastern
4 -94.2 41.9 -95.4 29.8 Central
5 -75.4 38.2 -73.6 45.6 Eastern
6 -99.4 28.2 -89.4 48.2 Eastern
7 -77.4 38.9 -77.6 38.8 Eastern
8 -116. 43.6 -116. 43.6 West
9 -89.6 44.2 -96.4 34.1 Eastern
10 -77.6 38.9 -77.4 38.9 Eastern
I've looked a few examples, but haven't found one that looks quite like my data set.
The tricky thing is to create a valid LINESTRING object from the coordinate pairs in wide format. sf expects linestring coordinates in rows of a matrix. Here's a way that works. The sfc column of a sf object is a list so here we use lapply to loop over the rows of the data you provided.
library(sf)
library(mapview)
b = dat[, c("Blong", "Blat")]
names(b) = c("long", "lat")
e = dat[, c("Elong", "Elat")]
names(e) = c("long", "lat")
dat$geometry = do.call(
"c",
lapply(seq(nrow(b)), function(i) {
st_sfc(
st_linestring(
as.matrix(
rbind(b[i, ], e[i, ])
)
),
crs = 4326
)
}))
dat_sf = st_as_sf(dat)
mapview(dat_sf, zcol = "Flyway")

Column from one data.table ordered by column from other data.table

I have two data.tables A:
contract.name contract.start contract.end price
Q1-2019 2019-01-01 2019-04-01 10
Q2-2019 2019-04-01 2019-07-01 12
Q3-2019 2019-07-01 2019-10-01 11
Q4-2019 2019-10-01 2020-01-01 13
and B:
contract delivery.begin delivery.end bid ask
Q2-2018 2018-04-01 2018-06-30 9.8 10.5
Q3-2018 2018-07-01 2018-09-30 11.5 12.1
Q4-2018 2018-10-01 2018-12-31 10.5 11.3
Q1-2019 2019-01-01 2019-03-31 12.8 13.5
I want a vector with the bid values from B ordered by the contract.name values from A like so:
bid = c(12.8, 0, 0, 0)
df1 %>%
left_join(df2, by=c("contract.name"="contract")) %>%
select(bid) %>%
replace_na(list(bid=0)) %>%
as.character()
Output is:
"c(12.8, 0, 0, 0)"
Sample data:
df1 <- structure(list(contract.name = c("Q1-2019", "Q2-2019", "Q3-2019",
"Q4-2019"), contract.start = c("2019-01-01", "2019-04-01", "2019-07-01",
"2019-10-01"), contract.end = c("2019-04-01", "2019-07-01", "2019-10-01",
"2020-01-01"), price = c(10L, 12L, 11L, 13L)), .Names = c("contract.name",
"contract.start", "contract.end", "price"), class = "data.frame", row.names = c(NA,
-4L))
df2 <- structure(list(contract = c("Q2-2018", "Q3-2018", "Q4-2018",
"Q1-2019"), delivery.begin = c("2018-04-01", "2018-07-01", "2018-10-01",
"2019-01-01"), delivery.end = c("2018-06-30", "2018-09-30", "2018-12-31",
"2019-03-31"), bid = c(9.8, 11.5, 10.5, 12.8), ask = c(10.5,
12.1, 11.3, 13.5)), .Names = c("contract", "delivery.begin",
"delivery.end", "bid", "ask"), class = "data.frame", row.names = c(NA,
-4L))
library(data.table)
DT.A <- data.table(structure(list(contract.name = structure(1:4, .Label = c("Q1-2019",
"Q2-2019", "Q3-2019", "Q4-2019"), class = "factor"), contract.start = structure(1:4, .Label = c("2019-01-01",
"2019-04-01", "2019-07-01", "2019-10-01"), class = "factor"),
contract.end = structure(1:4, .Label = c("2019-04-01", "2019-07-01",
"2019-10-01", "2020-01-01"), class = "factor"), price = c(10L,
12L, 11L, 13L)), .Names = c("contract.name", "contract.start",
"contract.end", "price"), class = "data.frame", row.names = c(NA,
-4L)))
DT.B <- data.table(structure(list(contract = structure(c(2L, 3L, 4L, 1L), .Label = c("Q1-2019",
"Q2-2018", "Q3-2018", "Q4-2018"), class = "factor"), delivery.begin = structure(1:4, .Label = c("2018-04-01",
"2018-07-01", "2018-10-01", "2019-01-01"), class = "factor"),
delivery.end = structure(1:4, .Label = c("2018-06-30", "2018-09-30",
"2018-12-31", "2019-03-31"), class = "factor"), bid = c(9.8,
11.5, 10.5, 12.8), ask = c(10.5, 12.1, 11.3, 13.5)), .Names = c("contract",
"delivery.begin", "delivery.end", "bid", "ask"), class = "data.frame", row.names = c(NA,
-4L)))
# Get vector of contract names
orderVals <- DT.A$contract.name
# Key table B by contract
setkey(DT.B, contract)
# Extract rows from table B with the specified key values
output <- DT.B[.(orderVals)]
# Change the values where there was no match from NA to 0
output[is.na(bid), bid := 0]
# Get desired vector
output$bid
You can do:
library("data.table")
A <- fread(
"contract.name contract.start contract.end price
Q1-2019 2019-01-01 2019-04-01 10
Q2-2019 2019-04-01 2019-07-01 12
Q3-2019 2019-07-01 2019-10-01 11
Q4-2019 2019-10-01 2020-01-01 13")
B <- fread(
"contract delivery.begin delivery.end bid ask
Q2-2018 2018-04-01 2018-06-30 9.8 10.5
Q3-2018 2018-07-01 2018-09-30 11.5 12.1
Q4-2018 2018-10-01 2018-12-31 10.5 11.3
Q1-2019 2019-01-01 2019-03-31 12.8 13.5")
setnames(B, "contract", "contract.name")
A[B, on="contract.name", bid:=bid][, ifelse(is.na(bid), 0, bid)]
# > A[B, on="contract.name", bid:=bid][, ifelse(is.na(bid), 0, bid)]
# [1] 12.8 0.0 0.0 0.0
or (a variant without ifelse()):
setnames(B, "contract", "contract.name")
A[B, on="contract.name", bid:=bid]
A[is.na(bid), bid:=0][, bid]

Picking a number from vector and assign to column based on multiple conditions in R

I need to add a Thickness column to my Products table based on multiple conditions.
1 : Thickness should be only one of these values
Plate_Thickness <- c(5.8,25.1,27.1,32.5,55.6,98.1,120.4)
2 : Thickness should be between the ThicknessMin and ThicknessMax values already existing in table.
Current table looks like this:
Product ThicknessMin ThicknessMax
P0001 0 8
P0002 31.01 70
P0003 8.01 31
P0004 70.01 999
P0005 8.01 31
So, the idea is to pick a value for Thickness from the vector randomly but it should be between the ThicknessMin and ThicknessMax. Please help with any pointers how to go about this. Thanks.
A vectorized base R solution (df is your data.frame):
set.seed(1) #just for reproducibility
a<-findInterval(df$ThicknessMin,Plate_Thickness,all.inside=TRUE)
b<-findInterval(df$ThicknessMax,Plate_Thickness,all.inside=TRUE)
Plate_Thickness[runif(length(a)) %/% (1/(b-a+1))+a]
#[1] 5.8 32.5 25.1 98.1 5.8
Your data
Plate_Thickness <- c(5.8,25.1,27.1,32.5,55.6,98.1,120.4)
df <- structure(list(Product = c("P0001", "P0002", "P0003", "P0004",
"P0005"), ThicknessMin = c(0, 31.01, 8.01, 70.01, 8.01), ThicknessMax = c(8L,
70L, 31L, 999L, 31L), Plate_Thickness = c(5.8, 32.5, 27.1, 120.4,
25.1)), .Names = c("Product", "ThicknessMin", "ThicknessMax",
"Plate_Thickness"), row.names = c(NA, -5L), class = c("data.table",
"data.frame"))
solution
library(dplyr)
acceptable_vals <- lapply(1:nrow(df), function(x) Plate_Thickness[between(Plate_Thickness, df$ThicknessMin[x], df$ThicknessMax[x])])
set.seed(1)
df$Plate_Thickness <- sapply(acceptable_vals, function(x) x[sample(1:length(x), 1)])
Output
Product ThicknessMin ThicknessMax Plate_Thickness
1: P0001 0.00 8 5.8
2: P0002 31.01 70 32.5
3: P0003 8.01 31 27.1
4: P0004 70.01 999 120.4
5: P0005 8.01 31 25.1
We can use the rowwise function from the dplyr package to sample from the Plate_Thickness vector. Within the call to sample, we sample only from elements of Plate_Thickness which are between ThicknessMin and ThicknessMax. I put your table in a data.frame called dat:
library(dplyr)
set.seed(123)
dat %>%
rowwise() %>%
mutate(thick_sample = sample(Plate_Thickness[between(Plate_Thickness, ThicknessMin, ThicknessMax)],
1))
Product ThicknessMin ThicknessMax thick_sample
<fctr> <dbl> <int> <dbl>
1 P0001 0.00 8 2.0
2 P0002 31.01 70 55.6
3 P0003 8.01 31 25.1
4 P0004 70.01 999 120.4
5 P0005 8.01 31 27.1
Data (for reproducibility)
dat <- structure(list(Product = structure(1:5, .Label = c("P0001", "P0002",
"P0003", "P0004", "P0005"), class = "factor"), ThicknessMin = c(0,
31.01, 8.01, 70.01, 8.01), ThicknessMax = c(8L, 70L, 31L, 999L,
31L)), .Names = c("Product", "ThicknessMin", "ThicknessMax"), class = "data.frame", row.names = c(NA,
-5L))
#DATA
df = structure(list(Product = c("P0001", "P0002", "P0003", "P0004",
"P0005"), ThicknessMin = c(0, 31.01, 8.01, 70.01, 8.01), ThicknessMax = c(8L,
70L, 31L, 999L, 31L)), .Names = c("Product", "ThicknessMin",
"ThicknessMax"), class = c("data.table", "data.frame"), row.names = c(NA,
-5L))
Plate_Thickness = c(5.8,25.1,27.1,32.5,55.6,98.1,120.4)
set.seed(1)
apply(X = df[c("ThicknessMin", "ThicknessMax")],
MARGIN = 1, #Run FUN on each row of X
FUN = function(x)
#Retain only eligible values for each row and sample 1 value
sample(x = Plate_Thickness[Plate_Thickness > x[1] & Plate_Thickness < x[2]],
size = 1))
#[1] 2.0 32.5 27.1 120.4 25.1

Resources