Split block of text into separate parts - r

New to R so apolgies if this is obvious..
Given a text document containing a sample block of text such as the following:
Deputy Kermit: Sir, providing access to good education for all the
Utoppia's children is one of our most important responsibilities as
States’ Members. We all recognise that. On the morning we began to
debate the future of selection in secondary education that was why
feelings ran so high, and why it was so closely fought.
But our responsibility does not stop at the doors of this Assembly.
For the sake of practicality we delegate day-to-day policy
responsibilities to individual Committees. As Deputy Fozzy has rightly
said, the Committee is the agent of the States. Ultimately it should
do what it is told. So there should be no doubt that the buck stops
with us, the States, to be sure that our agent, the Committee, has the
skills, strength and experience necessary for the task we have
assigned to it. If the Committee is not the right one for the task
ahead, especially if it is a task of vital importance to our Island,
then it is our duty to deal with that. We must remember that there is
no hierarchy here, no power to hire or fire discreetly in this
Assembly. If a Committee is in the wrong job but it does not step
down, the only tool we have to manage that is a motion of no
confidence.
Deputy Fozzy’s record too is similar, he just said that change is a
recipe for disaster. On the steps of the States after December’s
debate he told us that Utopia would rue –
The Bailiff: Deputy Fozzy.
Deputy Fozzy: That was never said on the steps of this Assembly
after the debate. I have said nothing ever like that after the debate.
I think you need to check your facts.
The Bailiff: Through the Chair,
Deputy Kermit: I repeat what I heard in the media, sir.
I would like to split each speakers statements out into their own separate file. What are my options to do this, given the speakers title (in this example Deputy or Baliff) and the character ':' may also occur within the block of text?

Not sure about the sentence breaks here...just an attempt.
Regex:
(^|[\W\S]\s*)(([A-Z][a-z]+\s?)+:)
Replacement:
$1\n\n$2
Output:
Deputy Kermit: Sir, providing access to good education for all the Utoppia's children is one of our most important responsibilities as States’ Members. We all recognise that. On the morning we began to debate the future of selection in secondary education that was why feelings ran so high, and why it was so closely fought.
But our responsibility does not stop at the doors of this Assembly. For the sake of practicality we delegate day-to-day policy responsibilities to individual Committees. As Deputy Fozzy has rightly said, the Committee is the agent of the States. Ultimately it should do what it is told. So there should be no doubt that the buck stops with us, the States, to be sure that our agent, the Committee, has the skills, strength and experience necessary for the task we have assigned to it. If the Committee is not the right one for the task ahead, especially if it is a task of vital importance to our Island, then it is our duty to deal with that. We must remember that there is no hierarchy here, no power to hire or fire discreetly in this Assembly. If a Committee is in the wrong job but it does not step down, the only tool we have to manage that is a motion of no confidence.
Deputy Fozzy’s record too is similar, he just said that change is a recipe for disaster. On the steps of the States after December’s debate he told us that Utopia would rue –
The Bailiff: Deputy Fozzy.
Deputy Fozzy: That was never said on the steps of this Assembly after the debate. I have said nothing ever like that after the debate. I think you need to check your facts.
The Bailiff: Through the Chair,
Deputy Kermit: I repeat what I heard in the media, sir.

Related

Reassign user story during sprint?

If a story is in progress and then swim lanes are code review and QA-ready, how should the assignment of stories work? Should a story remain assigned to the developer? And should the code review and QA tasks be created as sub-tasks in it? Or should the story be re-assigned when it is moved to code review by the developer, and when code review is done, it is moved to QA lane by the reviewer and re-assigned to QA by the reviewer. It seems anti-pattern to re-assign tickets from in-progress to future states. It looks okay to re-assign tickets before it was brought in the sprint but not after.
Scrum does not have anything to say about how the work is done nor how a board is managed. However, many team's look at Kanban's "pull" approaches to answer this. In that case, work is never assigned or given, it is only claimed/taken on. Therefor, work would be moved to "Code Review" by the reviewer when they began the work. Similarly, the work would be moved to QA by the tester when they started. "Ready" columns are a bit of a misnomer as they are not states. Rather, they are statuses of the previous state. If your order is Code Review - QA Ready - QA, then in fact, QA ready is a possible designation on work in Code Review. This may seem minor, but it is very important to prevent pile-ups in your process where work stalls without owners.
There is no single answer, but one way of doing it is to think of of a User Story as a container of tasks where each task is a small technical deliverable of any kind. With this mindset you can effectivly stop thinking of who the assignee is as each developer will have its small contribution towards the goal.
One of the problems with task re-assignment is that at one point you can loose traceability of who has done what and productivity on per developer basis. So in this sense having each teammember doing its own tasks and delivering towards the completion of a user story can solve this.
Then you can assign the User Story to the product owner, or you can assign it to a developer that kind of holds ownership towards its delivery to test when the tester will take over. But the user story when assigned to a developer does not mean that he owns the User Story, it just means that it is his responsibility to ensure hand over to test nothing more nothing less.
When a tester encounters a bug then you create a bug attached to the User story.
Not recommended. It's feasible tho. You have to assess your current work situation. If the user story is something that can make a whole difference, then it would be better to just stop the sprint, reassess your situation and make the necessary changes - then continue. Either way, when you are adding a new user story to the backlog, deadlines can be hardly met.
We are using a little bit different approach. Like we have following columns on Jira Board.
To-do
In_progress
Ready for Review
Ready for QA
In-Testing
Rework/Rejected
Done
A developer pick a task from to-do and assign it to him self and keep it in-progress. Once he is done he moved it to Ready for Review and keep it un assign. Someone will pick it and assign it to himself and review it. After reviewing that person will move the case to ready for QA without assigning it to anyone. Whoever is free or plaining to work on case will assign that case to himself and when he starts working on the case, he will move it to in-testing. As a result of testing the case can go in rework/rejected or in Done. If it moved to Rework/Rejected he will assign it to original person who initially worked on it. And that person when rework on it, will move the case to in-progress again.

How to read a list of values into a data table in a sandbox?

I have a list of data. It's all a single column, each row is a comment from a post asking for book recommendations. Here's an example, containing the first 2 entries:
"My recommendations from books I read this year:<p>Bad Blood : Man, this book really does read like a Hollywood movie screenplay. The rise and fall of Theranos, documented through interviews with hundreds of ex-employees by the very author who came up with the first expose of Theranos. Truly shows the flaws in the "fake it before you make it" mindset and how we glorify "geniuses".<p>Shoe Dog : Biography of the founder of Nike. Really liked how it's not just a book glorifying the story of Nike, but tells the tale of how much effort, balance and even pure luck went into making the company the household name it is today.<p>Master Algorithm : It's a book about the different fields of Machine learning (from Bayesian to Genetic evolution algos) and talks about the pros and cons of each and how these can play together to create a "master algorithm" for learning. It's a good primer for people entering the field and while it's not a DIY, it shows the scope of the problem of learning as a whole.<p>Three Body Problem: Finally, after years of people telling me to read this (on HN and off), I read the trilogy (Remembrance of Earth's Past), and I must say, the series does live up to the hype. Not only is it fast paced and deeply philosophical, but it's presented in a format very accessible to casual readers as well (unlike many hard sci-fi books which seem to revel in complexity). If I had to describe this series in a single line, it's "What would happen if China was the country that made first contact with an alien race?"","A selection:<p>Sapiens (Yuval Noah Harari, 2014 [English]) - A bit late to the party on this one. Mostly enjoyed it, especially the early ancient history stuff, but I felt it got a bit contrived in the middle - like the author was forcing it. Overall a good read though.<p>How to Invent Everything (Ryan North, 2018) - First book I've pre-ordered in a long time. A look at the history of civilization and technology through a comedic lens. Pretty funny and enjoyable.<p>The Rise of Theodore Roosevelt (Edmund Morris, 1979) - Randomly happened across this book while browsing a used bookstore for some stuff to read on a summer vacation. Loved it. It's big, but reads pretty quick for a biography. I've been a fan of TR since I first really learned about him in High School and I would recommend this for anyone interested in TR/The West/Americana.<p>Jaws (Peter Benchley, 1974) - Quite a bit darker than the movie.<p>Sharp Objects (Gillian Flynn, 2006) - I enjoyed Gone Girl (book and film) so I wanted to read this before the HBO series. To be honest...not my cup of tea. It was <i>okay</i>.<p>The Art of Racing in the Rain (Garth Stein, 2008) - Made me cry on an airplane. Thankfully my coworkers were on a different flight."
(Notice, comments are separated by ",")
I'm trying to load this list into a data table in an R sandbox (rapporter.net). But because of browser security, I can't load a local file (fread, read.table).
How can I read raw data into a data table in R?

Character vector using tidyr

I have a dataframe:
free_text
"Lead Software Engineer Who We Are: CareerBuilder is the global leader in human capital solutions as we help people target and attract their most important asset - their people. From candidate sourcing solutions, to comprehensive workforce data, to software that streamlines your recruiting process, our focus is always about making your recruitment strategy simple, fast and effective. Are you an experienced software engineer looking to take the next step to leadership? Would you like to lead a team of agile software developers? If so, then we have an immediate need for a self-motivated software engineering lead to join the Candidate Data Processing team in our Norcross, Georgia office. The Candidate Data Processing team is responsible for processing and enriching millions of candidate profiles. We use the Amazon AWS ecosystem as well as our own in-house platform to enhance, normalize, and index candidate profiles from a variety of sources. Our projects require scalable solutions with continuous availability. CareerBuilder engineers participate in every phase of the software development lifecycle and are encouraged to have vision beyond the technical aspects of a project. This position requires knowledge in the theory and practical application of object-oriented design and programming. Prior leadership experience and experience with databases and cloud-computing technologies are desired. Your primary responsibilities as an Engineering Lead will be split between management and technical contributions. You will work with an agile project manager and a product owner to establish objectives and results, and you will lead a team of 3 to 5 software engineers to meet those objectives in a sustainable process. Some of the technologies your team will be using include: AWS (Lambda, SNS, S3, EC2, SQS, DynamoDB, etc.) Java or .net (Java, C#, VB.Net) Unit testing (Junit, MSTest, Moq) Relational databases (SQL) Web services (REST APIs, JSON, RestSharp) Git/github Linux (bash, cron) Job Requirements What we need from you: A passion for technology and bringing your visions to reality through code and leveraging state of the art technologies As a lead, you will take ownership of issues and challenges and will also be a proactive and effective communicator; this role requires successful verbal and written communication to many different audiences inside and outside of Careerbuilder Demonstrated ability to earn your teammates' trust and respect through clear, honest, and helpful communication We prefer you to have proven leadership experience, but also be a hands on, passionate coder BS in Computer Science or related field (preferred but not required) What you will receive: When you're focused on the goal, not the path - you can be more flexible, and that translates into more productive and satisfied employees. From flexible hours to volunteering during work hours to diverse education opportunities, CareerBuilder.com is committed to helping employees strike a balance. Training that positions you to continuously grow with ongoing learning and development courses; we never stop investing in our people. Summer Hours! Enjoy 1/2 day paid Fridays during Summer Hours Quarterly 24 hour Hackathons and bi-weekly personal development time to learn new skills Paid volunteer time and coordinated opportunities to give back to the community Bagel Fridays! Casual Dress Code and laid back environment; don't worry about buying new suits and dry cleaning bills! Comprehensive Medical, Dental & Vision Programs Education Reimbursement Program allowing up to $5k per year towards completion of a Bachelor's and non-MBA graduate degree, and up to $10K per year towards completion of an MBA! No strings attached! $400 Annual Reimbursement for Wellness Activities, including your gym membership! 401(k) Program with Strong Employer Match and 2 year vesting schedule! Five Star Company Paid Trips for top performers, pack your bags and get ready to experience luxury! CareerBuilder, LLC is proud to be an Equal Opportunity Employer. Applicants are considered for all positions without regard to race, color, religion, sex, national origin, age, disability, sexual orientation, ancestry, marital or veteran status."
"Quality Engineer TSS is currently seeking Quality Engineer for Industrial Manufacturer in the London, KY area. Qualified candidates must have experience in Quality Engineering or related degree. Job Requirements Directs sampling inspection, and testing of produced/received parts, components and materials to determine conformance to standards. Host customers for audits, react to customer complaints, follow through on all sorting and rework of suspect parts. Control of the product sorting/hold areas of the facility. Responsible for directing, instructing and organizing the work of parts sort area. Must follow-up with efficiency, effectiveness and safety of those assigned to work the area. Provides training and completes documentation of all quality training provided to Company employees and forwarding that paperwork to the appropriate individuals (Supervisors, Engineering, Human Resources, etc.). Develop PPAP documentation for specific products; including Quality Control Plans, Flowcharts, FMEA’s, Inspection Reports, measurement/calculations coordination and PSW. Acts as Internal Auditor Coordinator and oversees the maintenance of all TS 16949 documentation. Applies statistical process control (SPC) methods for analyzing data to evaluate the current process and process changes. Works with supervisors and other responsible persons on determining root cause and developing corrective actions for all internal quality concerns. Participate in APQP for specific programs. Communicate with the customer as necessary to ensure all issues around assigned programs are resolved in a timely manner. Respond to customer corrective Action Requests. Develop gauging requirements for assigned programs. Monitor process capability to ensure required standards are maintained. Participate in Continuous Improvement programs. Perform workstation audits on assigned programs. Perform vendor quality audits as required. Prepares and presents technical and program information to team members and management. Accepts responsibility for subordinates?activities; Solicits and applies customer feedback (internal and external); Fosters quality focus in others. Provides computerized status report describing progress and concerns related to inspection activities, nonconforming items, and/or other items related to the quality of the process, material, or product. Reviews quality trends, tracks the root cause of problems, and coordinates correction actions. Provides input and recommendations to management on process of procedural system improvements, such as configuration management and operations functions. Work with technicians to ensure products are measured correctly and all data is compiled for on-time PPAP submissions. Will document and review supplier quality issues to the quality files daily, and communicate any needed Corrective Actions or plans from the suppliers. Formulates contingency plans, reviews control plans and FMEAs and makes necessary updates to the database as needed. Responsibilities include training; assigning and directing work of temporary re-work employees. All other duties as assigned. Training: TS 16949 Documentation: APQP, PPAP, FMEA, MSA Internal Auditing Education Requirements: College degree or equivalent experience as determined by the Quality Manager. Skills: To perform this job successfully, an individual must be able to perform each essential job functions satisfactory. The duties and responsibilities listed above are representative of the knowledge, skill and/or ability required for the position. Excellent verbal and written skills: Proficient in computer software including Word, Excel, Access: Strong leadership skills: Good problem solving skills; Communicate well with others at all levels. Experience: To perform this position successfully, an individual should have a minimum of three (3) years in related field. "
An I try to test this code:
library(tidytext)
library(stringr)
reg <- "([^A-Za-z_\\d##']|'(?![A-Za-z_\\d##]))"
tidy_df <- df %>%
filter(!str_detect(text, "^RT")) %>%
mutate(text = str_replace_all(text,
"https://t.co/[A-Za-z\\d]+|http://[A-Za-z\\d]+|&|<|>|RT|https",
"")) %>%
unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word,
str_detect(word, "[a-z]"))
But I receive this error:
Error in stri_detect_regex(string, pattern, opts_regex = opts(pattern)) :
argument `str` should be a character vector (or an object coercible to)
Is there any problem with the input data and this error happens? What can I do to fix it?
You forgot to load dplyr (library(dplyr)). This causes R to use stats::filter() rather than dplyr::filter(). The former function has a different signature and does not expose free_text to the inner str_detect().

Online Payment System - Lottery Application (Stripe Prohibited Businesses) alternative?

Basically I was looking to use stripe to take online payments for an online lottery website however the application is marked as a prohibited business.
Prohibited Businesses: Gambling
Lotteries; bidding fee auctions; sports forecasting or odds making; fantasy sports leagues with cash prizes; internet gaming; contests; sweepstakes; games of chance
Alternative Options??
I was looking for another option instead of stripe that would take online payments for my application.
It is a startup business so i would like the payment option to handle the merchant bank account side like stripe/paypal.
The project is being developed on asp.net, web forms c#.
Any advice would be greatly appreciated.
Most countries are regulating gambling of any form.
A few examples:
some countries like France have a company dealing with such purpose under the authority of the government.
for US, gambling regulation is different by state, and some don't even allow Lottery at all.
in Ireland, latest laws allows online gambling, prior to acquire a license delivered by the state. Not having this license can cost up to €300,000 as a fine.
There is a good chance that your Lottery application will fall under the same regulation, in which case you have to contact whichever authority in your country to ask them how you can create a gambling application under required law, if permitted (keeping in mind that this could be a pretty tedious and long process).
Bottom line of your question:
Stripe or other online payment systems are not allowing these types of payments because of this regulation.
Even if passing the barrier of regulation, a lot of technical restrictions would have to be applied to verify people residence to avoid legal issues.
UPDATE:
One option as mentioned in comments would be to use Bitcoin (using it with ASP.NET) as an alternative money to circumvent legal issues, but that doesn't mean that this is not regulated yet or going to be in a near future (which falls legally under a Lacuna).

How do you estimate a ROI for clearing technical debt?

I'm currently working with a fairly old product that's been saddled with a lot of technical debt from poor programmers and poor development practices in the past. We are starting to get better and the creation of technical debt has slowed considerably.
I've identified the areas of the application that are in bad shape and I can estimate the cost of fixing those areas, but I'm having a hard time estimating the return on investment (ROI).
The code will be easier to maintain and will be easier to extend in the future but how can I go about putting a dollar figure on these?
A good place to start looks like going back into our bug tracking system and estimating costs based on bugs and features relating to these "bad" areas. But that seems time consuming and may not be the best predictor of value.
Has anyone performed such an analysis in the past and have any advice for me?
Managers care about making $ through growth (first and foremost e.g. new features which attract new customers) and (second) through optimizing the process lifecycle.
Looking at your problem, your proposal falls in the second category: this will undoubtedly fall behind goal #1 (and thus get prioritized down even if this could save money... because saving money implies spending money (most of time at least ;-)).
Now, putting a $ figure on the "bad technical debt" could be turned around into a more positive spin (assuming that the following applies in your case): " if we invest in reworking component X, we could introduce feature Y faster and thus get Z more customers ".
In other words, evaluate the cost of technical debt against cost of lost business opportunities.
Sonar has a great plugin (technical debt plugin) to analyze your sourcecode to look for just such a metric. While you may not specifically be able to use it for your build, as it is a maven tool, it should provide some good metrics.
Here is a snippet of their algorithm:
Debt(in man days) =
cost_to_fix_duplications +
cost_to_fix_violations +
cost_to_comment_public_API +
cost_to_fix_uncovered_complexity +
cost_to_bring_complexity_below_threshold
Where :
Duplications = cost_to_fix_one_block * duplicated_blocks
Violations = cost_to fix_one_violation * mandatory_violations
Comments = cost_to_comment_one_API * public_undocumented_api
Coverage = cost_to_cover_one_of_complexity *
uncovered_complexity_by_tests (80% of
coverage is the objective)
Complexity = cost_to_split_a_method *
(function_complexity_distribution >=
8) + cost_to_split_a_class *
(class_complexity_distribution >= 60)
I think you're on the right track.
I've not had to calculate this but I've had a few discussions with a friend who manages a large software development organisation with a lot of legacy code.
One of the things we've discussed is generating some rough effort metrics from analysing VCS commits and using them to divide up a rough estimate of programmer hours. This was inspired by Joel Spolsky's Evidence-based Scheduling.
Doing such data mining would allow you to also identify clustering of when code is being maintained and compare that to bug completion in the tracking system (unless you are already blessed with a tight integration between the two and accurate records).
Proper ROI needs to calculate the full Return, so some things to consider are:
- decreased cost of maintenance (obviously)
- opportunity cost to the business of downtime or missed new features that couldn't be added in time for a release
- ability to generate new product lines due to refactorings
Remember, once you have a rule for deriving data, you can have arguments about exactly how to calculate things, but at least you have some figures to seed discussion!
I can only speak to how to do this empirically in an iterative and incremental process.
You need to gather metrics to estimate your demonstrated best cost/story-point. Presumably, this represents your system just after the initial architectural churn, when most of design trial-and-error has been done but entropy has had the least time to cause decay. Find the point in the project history when velocity/team-size is the highest. Use this as your cost/point baseline (zero-debt).
Over time, as technical debt accumulates, the velocity/team-size begins to decrease. The percentage decrease of this number with respect to your baseline can be translated into "interest" being paid on each new story point. (This is really interest paid on technical and knowledge debt)
Disciplined refactoing and annealing causes the the interest on technical debt to stablize at some value higher than your baseline. Think of this as the steady-state interest the product owner pays on the technical debt in the system. (The same concept applies to knowledge debt).
Some systems reach the point where the cost + interest on each new story point exceeds the value of the feature point being developed. This is when the system is bankrupt, and it's time to rewrite the system from scratch.
I think it's possible to use regression analysis to tease apart technical debt and knowledge debt (but I haven't tried it). For example, if you assume that technical debt correlates closely with some code metrics, e.g. code duplication, you could determine the degree the interest being paid is increasing because of technical debt versus knowledge debt.
+1 for jldupont's focus on lost business opportunities.
I suggest thinking about those opportunities as perceived by management. What do they think affects revenue growth -- new features, time to market, product quality? Relating debt paydown to those drivers will help management understand the gains.
Focusing on management perceptions will help you avoid false numeration. ROI is an estimate, and it is no better than the assumptions made in its estimation. Management will suspect solely quantitative arguments because they know there's some qualitative in there somewhere. For example, over the short term the real cost of your debt paydown is the other work the programmers aren't doing, rather than the cash cost of those programmers, because I doubt you're going to hire and train new staff just for this. Are the improvements in future development time or quality more important than features these programmers would otherwise be adding?
Also, make sure you understand the horizon for which the product is managed. If management isn't thinking about two years from now, they won't care about benefits that won't appear for 18 months.
Finally, reflect on the fact that management perceptions have allowed this product to get to this state in the first place. What has changed that would make the company more attentive to technical debt? If the difference is you -- you're a better manager than your predecessors -- bear in mind that your management team isn't used to thinking about this stuff. You have to find their appetite for it, and focus on those items that will deliver results they care about. If you do that, you'll gain credibility, which you can use to get them thinking about further changes. But appreciation of the gains might be a while in growing.
Being a mostly lone or small-team developer this is out of my field, but to me a great solution to find out where time is wasted is very, very detailed timekeeping, for example with a handy task-bar tool like this one that can even filter out when you go to the loo, and can export everything to XML.
It may be cumbersome at first, and a challenge to introduce to a team, but if your team can log every fifteen minutes they spend due to a bug, mistake or misconception in the software, you accumulate a basis of impressive, real-life data on what technical debt is actually costing in wages every month.
The tool I linked to is my favourite because it is dead simple (doesn't even require a data base) and provides access to every project/item through a task bar icon. Also entering additional information on the work carried out can be done there, and timekeeping is literally activated in seconds. (I am not affiliated with the vendor.)
It might be easier to estimate the amount it has cost you in the past. Once you've done that, you should be able to come up with an estimate for the future with ranges and logic even your bosses can understand.
That being said, I don't have a lot of experience with this kind of thing, simply because I've never yet seen a manager willing to go this far in fixing up code. It has always just been something we fix up when we have to modify bad code, so refactoring is effectively a hidden cost on all modifications and bug fixes.

Resources