I would like to capture the index value for any value less than 500 for a series of data.
Below is how my data looks like
Category,Price1,Price2,Price3,Price4,Price5,Price6
Product1,967,855,929,811,501,387
Product2,526,809,723,304,315,671
Product3,412,133,369,930,400,337
Product4,709,241,625,822,967,952
Product5,395,506,110,280,829,817
Product6,803,618,794,214,605,788
For example, in the first row, Price6 is the first element for the series between Price1 to Price6, where value is less than 500, hence in the output "First" has 6.
Similarly, for second row, Price4 has less than 500, and next Price5 has less than 500, hence the value for First and Second are 4,5 respectively for the series of data between price1 and Price6.
When nothing is capture in the logic, i want to place a "-" for the same.
Below is the output i am looking for.
Category,Price1,Price2,Price3,Price4,Price5,Price6,First,Second,Third,Fourth,Fifth,Sixth
Product1,967,855,929,811,501,387,6,-,-,-,-,-
Product2,526,809,723,304,315,671,4,5,-,-,-,-
Product3,412,133,369,930,400,337,1,2,3,5,6,-
Product4,709,241,625,822,967,952,2,-,-,-,-,-
Product5,395,506,110,280,829,817,1,3,4,-,-,-
Product6,803,618,794,214,605,788,4,-,-,-,-,-
Not sure how to do the same in R or excel.
Any leads would be highly appreciated.
Thanks,
Using data.table
dt[, when := melt(dt, id.vars = "Category")[, toString(which(value < 500)), Category][, V1]]
cbind(dt, dt[, tstrsplit(when, ", ", fill = "-")])
Gives
Category Price1 Price2 Price3 Price4 Price5 Price6 when V1 V2 V3 V4 V5
1: Product1 967 855 929 811 501 387 6 6 - - - -
2: Product2 526 809 723 304 315 671 4, 5 4 5 - - -
3: Product3 412 133 369 930 400 337 1, 2, 3, 5, 6 1 2 3 5 6
4: Product4 709 241 625 822 967 952 2 2 - - - -
5: Product5 395 506 110 280 829 817 1, 3, 4 1 3 4 - -
6: Product6 803 618 794 214 605 788 4 4 - - - -
Now you just need to replace the names V1-V5 and drop column when.
Data:
dt <- fread("Category,Price1,Price2,Price3,Price4,Price5,Price6
Product1,967,855,929,811,501,387
Product2,526,809,723,304,315,671
Product3,412,133,369,930,400,337
Product4,709,241,625,822,967,952
Product5,395,506,110,280,829,817
Product6,803,618,794,214,605,788")
One can try apply and tidyr::separate based solution as:
# First merge the data after moving values < 500 in left.
# The empty places should be filled with `-`
df$DesiredData <- apply(df[2:7],1,function(x){
value <- x[x<500]
paste0(c(value,rep("-",length(x)-length(value))),collapse = ",")
})
library(tidyverse)
# Now use `separate` function to split column in 6 desired columns
df %>% separate("DesiredData",
c("First","Second","Third","Fourth","Fifth","Sixth"), sep = ",")
# Category Price1 Price2 Price3 Price4 Price5 Price6 First Second Third Fourth Fifth Sixth
# 1 Product1 967 855 929 811 501 387 387 - - - - -
# 2 Product2 526 809 723 304 315 671 304 315 - - - -
# 3 Product3 412 133 369 930 400 337 412 133 369 400 337 -
# 4 Product4 709 241 625 822 967 952 241 - - - - -
# 5 Product5 395 506 110 280 829 817 395 110 280 - - -
# 6 Product6 803 618 794 214 605 788 214 - - - - -
Data:
df <- read.table(text="
Category,Price1,Price2,Price3,Price4,Price5,Price6
Product1,967,855,929,811,501,387
Product2,526,809,723,304,315,671
Product3,412,133,369,930,400,337
Product4,709,241,625,822,967,952
Product5,395,506,110,280,829,817
Product6,803,618,794,214,605,788",
header = TRUE, stringsAsFactors = FALSE, sep=",")
I am working on a shiny app where I want to display telephone numbers of contacts. If the number is a US number, I want to show it in a specific format, for example (XXX) XXX- XXXX else I just want to return the number as it is.
I tried the most simple way using substr. This is the function I have.
telFormat <- function(x){
if (is.na(x)){
return ("")
}
if(substr(x,1,3) %in% c("+1 ")){
p1 <- substr(x,4,6)
p2 <- substr(x,8,10)
p3 <- substr(x, 12,15)
return (paste("(",p1,") ",p2,"-",p3, sep = ""))
}
else
return (x)
}
The sample data I have is:
sample <- c("+1 312 252 7546", "+1 678 538 1919", "+44 (0) 207 743 4052",
"+44 (0) 207 743 3000", "+1 212 810 5300", NA, "+44 (0) 207 591 6630",
"+61 2 9272 2200", "+852 3903 2448", "+1 415 670 6267", "+44 (0) 207 743 3000",
"+1 212 810 5300", "+1 919 743 2500", "+1 919 743 2500", "+1 919 743 2500",
"+1 919 743 2500")
The output for the phone numbers starting with +1 gets converted correctly, but there is something wrong with the other numbers.
telFormat(sample)
#output
[1] "(312) 252-7546" "(678) 538-1919" "( (0) 20- 743" "( (0) 20- 743" "
(212) 810-5300" "(NA) NA-NA" "( (0) 20- 591"
[8] "( 2 ) 272-2200" "(2 3) 03 -448" "(415) 670-6267" "( (0) 20- 743" "
(212) 810-5300" "(919) 743-2500" "(919) 743-2500"
[15] "(919) 743-2500" "(919) 743-2500"
and I get this warning message too
Warning messages:
1: In if (is.na(x)) { :
the condition has length > 1 and only the first element will be used
2: In if (substr(x, 1, 3) %in% c("+1 ")) { :
the condition has length > 1 and only the first element will be used
What am I doing wrong here? Is there a efficient way to get the desired output?
If all US numbers in your data have a specific format, i.e. +1 XXX XXX XXXX, you can use regex ^\\+1 (\\d{3}) (\\d{3}) (\\d{4})$ to reformat it:
sub("^\\+1 (\\d{3}) (\\d{3}) (\\d{4})$", "(\\1) \\2-\\3", sample)
# [1] "(312) 252-7546" "(678) 538-1919" "+44 (0) 207 743 4052"
# [4] "+44 (0) 207 743 3000" "(212) 810-5300" NA
# [7] "+44 (0) 207 591 6630" "+61 2 9272 2200" "+852 3903 2448"
#[10] "(415) 670-6267" "+44 (0) 207 743 3000" "(212) 810-5300"
#[13] "(919) 743-2500" "(919) 743-2500" "(919) 743-2500"
#[16] "(919) 743-2500"
This uses capture groups with parenthesis to match the first three, second three and last four digits in a US number, refer to these pattern with back references \\ with a number as replacement.
May be somethign like this helps with stringr
library(stringr)
as.data.frame(do.call(rbind, lapply(str_match_all(sample[!is.na(sample)],
"(\\+1|.*)[^\\d]?(\\d+)[^\\d]+(\\d+)[^\\d]+(\\d+)$"), function(x) x[,2:5])))
V1 V2 V3 V4
1 +1 312 252 7546
2 +1 678 538 1919
3 +44 (0) 20 7 743 4052
4 +44 (0) 20 7 743 3000
5 +1 212 810 5300
6 +44 (0) 20 7 591 6630
7 +61 2 9272 2200
8 +85 2 3903 2448
9 +1 415 670 6267
10 +44 (0) 20 7 743 3000
11 +1 212 810 5300
12 +1 919 743 2500
13 +1 919 743 2500
14 +1 919 743 2500
15 +1 919 743 2500
I am a beginner with R . My data looks like this:
id count date
1 210 2009.01
2 400 2009.02
3 463 2009.03
4 465 2009.04
5 509 2009.05
6 861 2009.06
7 872 2009.07
8 886 2009.08
9 725 2009.09
10 687 2009.10
11 762 2009.11
12 748 2009.12
13 678 2010.01
14 699 2010.02
15 860 2010.03
16 708 2010.04
17 709 2010.05
18 770 2010.06
19 784 2010.07
20 694 2010.08
21 669 2010.09
22 689 2010.10
23 568 2010.11
24 584 2010.12
25 592 2011.01
26 548 2011.02
27 683 2011.03
28 675 2011.04
29 824 2011.05
30 637 2011.06
31 700 2011.07
32 724 2011.08
33 629 2011.09
34 446 2011.10
35 458 2011.11
36 421 2011.12
37 459 2012.01
38 256 2012.02
39 341 2012.03
40 284 2012.04
41 321 2012.05
42 404 2012.06
43 418 2012.07
44 520 2012.08
45 546 2012.09
46 548 2012.10
47 781 2012.11
48 704 2012.12
49 765 2013.01
50 571 2013.02
51 371 2013.03
I would like to make a bar graph like graph that shows how much what is the count for each date (dates in format of Month-Y, Jan-2009 for instance). I have two issues:
1- I cannot find a good format for a bar-char like graph like that
2- I want all of my data-points to be present in X axis(date), while R aggregates it to each year only (so I inly have four data-points there). Below is the current command that I am using:
plot(df$date,df$domain_count,col="red",type="h")
and my current plot is like this:
Ok, I see some issues in your original data. May I suggest the following:
Add the days in your date column
df$date=paste(df$date,'.01',sep='')
Convert the date column to be of date type:
df$date=as.Date(df$date,format='%Y.%m.%d')
Plot the data again:
plot(df$date,df$domain_count,col="red",type="h")
Also, may I add one more suggestion, have you used ggplot for ploting chart? I think you will find it much easier and resulting in better looking charts. Your example could be visualized like this:
library(ggplot2) #if you don't have the package, run install.packages('ggplot2')
ggplot(df,aes(date, count))+geom_bar(stat='identity')+labs(x="Date", y="Count")
First, you should transform your date column in a real date:
library(plyr) # for mutate
d <- mutate(d, month = as.numeric(gsub("[0-9]*\\.([0-9]*)", "\\1", as.character(date))),
year = as.numeric(gsub("([0-9]*)\\.[0-9]*", "\\1", as.character(date))),
Date = ISOdate(year, month, 1))
Then, you could use ggplot to create a decent barchart:
library(ggplot2)
ggplot(d, aes(x = Date, y = count)) + geom_bar(fill = "red", stat = "identity")
You can also use basic R to create a barchart, which is however less nice:
dd <- setNames(d$count, format(d$Date, "%m-%Y"))
barplot(dd)
The former plot shows you the "holes" in your data, i.e. month where there is no count, while for the latter it is even wuite difficult to see which bar corresponds to which month (this could however be tweaked I assume).
Hope that helps.