Couldn't find much support for this for R. I'm trying to read a number of RTF files into R to construct a data frame, but I'm struggling to find a good way to parse the RTF file and ignore the structure/formatting of the file. There are really only two lines of text I want to pull from each file -- but it's nested within the structure of the file.
I've pasted a sample RTF file below. The two strings I'd like to capture are:
"Buy a 26 Inch LCD-TV Today or a 32 Inch Next Month? Modeling Purchases of High-tech Durable Products"
"The technology level [...] and managerial implications." (the full paragraph)
Any thoughts on how to efficiently parse this? I think regular expressions might help me, but I'm struggling to form the right expression to get the job done.
{\rtf1\ansi\ansicpg1252\cocoartf1265
{\fonttbl\f0\fswiss\fcharset0 ArialMT;\f1\froman\fcharset0 Times-Roman;}
{\colortbl;\red255\green255\blue255;\red0\green0\blue0;\red109\green109\blue109;}
\margl1440\margr1440\vieww10800\viewh8400\viewkind0
\deftab720
\itap1\trowd \taflags0 \trgaph108\trleft-108 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalt \clshdrawnil \clwWidth15680\clftsWidth3 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\itap2\trowd \taflags0 \trgaph108\trleft-108 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clmgf \clvertalt \clshdrawnil \clwWidth14840\clftsWidth3 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx4320
\clmrg \clvertalt \clshdrawnil \clwWidth14840\clftsWidth3 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap2\pardeftab720
\f0\b\fs26 \cf0 Buy a 26 Inch LCD-TV Today or a 32 Inch Next Month? Modeling Purchases of High-tech Durable Products\nestcell
\pard\intbl\itap2\nestcell \lastrow\nestrow
\pard\intbl\itap1\pardeftab720
\f1\b0\fs24 \cf0 \
\pard\intbl\itap1\pardeftab720
\f0\fs26 \cf0 The technology level of new high-tech durable products, such as digital cameras and LCD-TVs, continues to go up, while prices continue to go down. Consumers may anticipate these trends. In particular, a consumer faces several options. The first is to buy the current level of technology at the current price. The second is not to buy and stick with the currently owned (old) level of technology. Hence, the consumer postpones the purchase and later on buys the same level of technology at a lower price, or better technology at the same price. We develop a new model to describe consumers\'92 decisions with respect to buying these products. Our model is built on the theory of consumer expectations of price and the well-known utility maximizing framework. Since not every consumer responds the same, we allow for observed and unobserved consumer heterogeneity. We calibrate our model on a panel of several thousand consumers. We have information on the currently owned technology and on purchases in several categories of high-tech durables. Our model provides new insights in these product markets and managerial implications.\cell \lastrow\row
\pard\pardeftab720
\f1\fs24 \cf0 \
}
1) A simple way if you are on Windows is to read it in using WordPad or Word and then save it as a plain text document.
2) Alternately, to parse it directly in R, read in the rtf file, find lines with the given pattern, pat producing g. Then replace any \\' strings with single quotes producing noq. Finally remove pat and any trailing junk. This works on the sample but you might need to revise the patterns if there are additional embedded \\ strings other than the \\' which we already handle:
Lines <- readLines("myfile.rtf")
pat <- "^\\\\f0.*\\\\cf0 "
g <- grep(pat, Lines, value = TRUE)
noq <- gsub("\\\\'", "'", g)
sub("\\\\.*", "", sub(pat, "", noq))
For the indicated file this is the output:
[1] "Buy a 26 Inch LCD-TV Today or a 32 Inch Next Month? Modeling Purchases of High-tech Durable Products"
[2] "The technology level of new high-tech durable products, such as digital cameras and LCD-TVs, continues to go up, while prices continue to go down. Consumers may anticipate these trends. In particular, a consumer faces several options. The first is to buy the current level of technology at the current price. The second is not to buy and stick with the currently owned (old) level of technology. Hence, the consumer postpones the purchase and later on buys the same level of technology at a lower price, or better technology at the same price. We develop a new model to describe consumers'92 decisions with respect to buying these products. Our model is built on the theory of consumer expectations of price and the well-known utility maximizing framework. Since not every consumer responds the same, we allow for observed and unobserved consumer heterogeneity. We calibrate our model on a panel of several thousand consumers. We have information on the currently owned technology and on purchases in several categories of high-tech durables. Our model provides new insights in these product markets and managerial implications."
Revised several times. Added Wordpad/Word solution.
Related
I am working on a case to finish my (not so advanced) data scientist course and I have already been helped a lot by topics here, thanks!
Unfortunately now I am stuck again and cannot find an existing answer.
My data comes from a bike shop and I want to see if products bought during customers' first registered purchase are related to/have impact on how important they will become to the shop in the future. I have grouped customers into 5 clusters (from those who registered and made never any registered purchase again, through these who made 2-3 purchases for little money, those who made a few purchases for a lot of money to those who purchase stuff regularly and really bring a lot of money to this bike shop), I have ordered them into an ordinal dependent variable.
As the independent variables I have prepared 20+ binary variables that identify products/services bought during the first purchase from this shop (first purchase as a registered customer). One row per customer. So I want to check the idea if there are combinations of products (probably "extras" to the bike purchase) that can increase the chance that a customer would register and hopefully stay as a loyal customer for the future.
The dream would be be able to say, for example, if you buy a cheap or middle-cheap bike during this first purchase you probably don't contribute so much to the bike shop in a long term so you have low grade on the dependent variable. But those who bought a middle-cheap bike AND a helmet AND a lock (probably to special price) are more likely to become one of the loyal registered customers bringing money for a longer time.
There might be no relation like that but I want to test that anyways. Implementation of the result could be being able to recommend an extra product during a purchase (with a good price on it).
I am learning R during this course. We went through some techniques and first I was imagining it would be possible to work with the neural networks (just cause it sounded most fun to try), having all these products as input in the sparse matrix and the customers clusters as the output (I hoped it was similar to the examples I read about with sparse matrix with pixels from a picture as the input and numbers 1-9 as the output) but then I was told that this actually is based on pictures and real patterns and in my case I don't even know if there is any.
Then I was thinking I could try with the ordinal forest. But it doesn't predict my clusters well, not at all (2 out of 5 clusters get no predictions). But that is OK, I don't expect the first purchase to be able to predict all the customers future. But I would really want to see if there are combinations of products that might increase the chance that a customer ends up in one of the "higher" clusters on the loyalty scale.
I am not sure if this was clear enough. :) Do you think that there is any way of testing my idea? What could I try to do? Let me know if you need more information.
I am trying to remove "\r\n-" in a text which I extracted from a PDF file using readtext() from readtext package in R Studio. Below is my code in R:
library(readtext)
jd <- readtext("C:/Users/HomeUser/Documents/Sales Manager.pdf")
jd_text <- jd$text
jd_text2 <- gsub(pattern = "\r\n-?|•", replacement = " ", jd_text)
Below is the original extracted text jd_text:
"Sales Manager\r\nCFB Bots is a technology service provider specializing in Intelligent Automation (IA). We partner with\r\nlarge enterprises in their Digital Transformation journey and help them and their employees thrive\r\nin the Future of Work. Our mission is to co-create the Digital Workforce of the Future, and our vision\r\nis to make work enjoyable. For more information, please visit www.cfb-bots.com.\r\nWe are looking for a high performing frontrunner to blaze the trail and make new connections for\r\nour growing business. As a Sales Manager, you will play a vital role in keeping the Company\r\ncompetitive by achieving our customer acquisition and revenue growth targets. You will be the key\r\nliaison in every stage of the sales process, from planning to closing the sales.\r\nIf you are passionate about technology and are motivated by a hunger to solve our clients’\r\nchallenges, read on to find out more.\r\nYou can gain:\r\n− Incentive for achieving sales targets\r\n− Exposure to the latest industry trends and technologies\r\n− Endless learning and growth opportunities\r\n− Sharpen sales planning, analytical and management skills\r\n− Flexible work-life benefits\r\nYou will do:\r\nSales Strategy\r\n- Develop ..."
I was able to remove many "\r\n-" in jd_text using gsub(). Output from jd_text2 below:
"Sales Manager CFB Bots is a technology service provider specializing in Intelligent Automation (IA). We partner with large enterprises in their Digital Transformation journey and help them and their employees thrive in the Future of Work. Our mission is to co-create the Digital Workforce of the Future, and our vision is to make work enjoyable. For more information, please visit www.cfb-bots.com. We are looking for a high performing frontrunner to blaze the trail and make new connections for our growing business. As a Sales Manager, you will play a vital role in keeping the Company competitive by achieving our customer acquisition and revenue growth targets. You will be the key liaison in every stage of the sales process, from planning to closing the sales. If you are passionate about technology and are motivated by a hunger to solve our clients’ challenges, read on to find out more. You can gain: − Incentive for achieving sales targets − Exposure to the latest industry trends and technologies − Endless learning and growth opportunities − Sharpen sales planning, analytical and management skills − Flexible work-life benefits You will do: Sales Strategy Develop ..."
As you can see, I was able to remove "\r\n-" occurring after "Flexible work-life benefits" while "-" from those first few "\r\n-" still remained. However, when I pasted the original text extract directly from the display of jd_text in R Studio console into a new variable jd_test, applied gsub() again, I was able to accomplish my goal:
jd_test <- "Sales Manager\r\nCFB Bots is a technology service provider specializing in Intelligent Automation (IA). We partner with\r\nlarge enterprises in their Digital Transformation journey and help them and their employees thrive\r\nin the Future of Work. Our mission is to co-create the Digital Workforce of the Future, and our vision\r\nis to make work enjoyable. For more information, please visit www.cfb-bots.com.\r\nWe are looking for a high performing frontrunner to blaze the trail and make new connections for\r\nour growing business. As a Sales Manager, you will play a vital role in keeping the Company\r\ncompetitive by achieving our customer acquisition and revenue growth targets. You will be the key\r\nliaison in every stage of the sales process, from planning to closing the sales.\r\nIf you are passionate about technology and are motivated by a hunger to solve our clients’\r\nchallenges, read on to find out more.\r\nYou can gain:\r\n− Incentive for achieving sales targets\r\n− Exposure to the latest industry trends and technologies\r\n− Endless learning and growth opportunities\r\n− Sharpen sales planning, analytical and management skills\r\n− Flexible work-life benefits\r\nYou will do:\r\nSales Strategy\r\n- Develop ..."
jd_test2 <- gsub(pattern = "\r\n-?|•", replacement = " ", jd_test)
Output from jd_test2:
Sales Manager CFB Bots is a technology service provider specializing in Intelligent Automation (IA). We partner with large enterprises in their Digital Transformation journey and help them and their employees thrive in the Future of Work. Our mission is to co-create the Digital Workforce of the Future, and our vision is to make work enjoyable. For more information, please visit www.cfb-bots.com. We are looking for a high performing frontrunner to blaze the trail and make new connections for our growing business. As a Sales Manager, you will play a vital role in keeping the Company competitive by achieving our customer acquisition and revenue growth targets. You will be the key liaison in every stage of the sales process, from planning to closing the sales. If you are passionate about technology and are motivated by a hunger to solve our clients’ challenges, read on to find out more. You can gain: Incentive for achieving sales targets Exposure to the latest industry trends and technologies Endless learning and growth opportunities Sharpen sales planning, analytical and management skills Flexible work-life benefits You will do: Sales Strategy Develop ..."
Anyone has any idea what is the problem and how do I go about it? I have tried using another function pdf_text() from pdftools package but it yielded the same frustrating result. At first I thought "-" for the first few "\r\n-" is slightly longer than the latter ones but the direct copy-paste attempt seems to contradict this observation. Is there something "hidden" in the object which is not migrated during the copy-paste action? Any suggestions is greatly appreciated!
I found a likely answer to my question. It seems the original extracted text from the PDF document is not in an encoding that R Studio could recognise. This would explain why for the first few "-"s were not removed. After I apply jd_text <-iconv(jd_text,"UTF-8") to coerce the encoding to UTF-8, my problem was solved, and I am able to remove "\r\n-" completely.
Let's say I have a strategy with multiple rules that generates multiple orders on the same symbol at the same timestamp. For example, on 2012-05-23 one rule might buy 10 shares of IBM while another rule sells 5 shares of IBM. In production, a reasonable system would use netting and execute one order to buy 5 shares, rather than one order to buy 10 shares and another order to sell 5 shares.
Is there a way to get this behaviour in quantstrat? From my experiments, quantstrat does not do netting, and for example will add transaction fees for both opposing orders as if two separate orders were executed.
If quantstrat cannot net orders then it should still be possible to obtain the desired PnL in backtesting by using a custom TxnFees function. If this is the correct way to go, how would one go about defining a custom function to net the transaction fees?
A 'reasonable system' would likely do no such thing. My experience of simultaneous execution on tick data is basically zero for aggressive orders.
On bar data, yes, internal netting would make sense, and would be handled by a production order management system. Or, for example, internalizing resting internal limit orders against other signals asking for aggressive orders on the other side, or netting positions. Does any investor of non-trivial size use bar data?
That seems to miss the point of what quantstrat is for. You are looking to figure out (in research) some strategy that makes good predictions and evaluate the quality of those predictions by writing a backtest.
Backtests aren't reality.
Further, netting would completely muddle any ability to figure out if your signal process has predictive power.
The account in blotter will net P&L automatically, so it will have the same result as your order netting, in the absence of fees. So I don't think you would need a separate TxnFees function to understand the possible impact of netting, pre-fees.
I have a dataframe:
free_text
"Lead Software Engineer Who We Are: CareerBuilder is the global leader in human capital solutions as we help people target and attract their most important asset - their people. From candidate sourcing solutions, to comprehensive workforce data, to software that streamlines your recruiting process, our focus is always about making your recruitment strategy simple, fast and effective. Are you an experienced software engineer looking to take the next step to leadership? Would you like to lead a team of agile software developers? If so, then we have an immediate need for a self-motivated software engineering lead to join the Candidate Data Processing team in our Norcross, Georgia office. The Candidate Data Processing team is responsible for processing and enriching millions of candidate profiles. We use the Amazon AWS ecosystem as well as our own in-house platform to enhance, normalize, and index candidate profiles from a variety of sources. Our projects require scalable solutions with continuous availability. CareerBuilder engineers participate in every phase of the software development lifecycle and are encouraged to have vision beyond the technical aspects of a project. This position requires knowledge in the theory and practical application of object-oriented design and programming. Prior leadership experience and experience with databases and cloud-computing technologies are desired. Your primary responsibilities as an Engineering Lead will be split between management and technical contributions. You will work with an agile project manager and a product owner to establish objectives and results, and you will lead a team of 3 to 5 software engineers to meet those objectives in a sustainable process. Some of the technologies your team will be using include: AWS (Lambda, SNS, S3, EC2, SQS, DynamoDB, etc.) Java or .net (Java, C#, VB.Net) Unit testing (Junit, MSTest, Moq) Relational databases (SQL) Web services (REST APIs, JSON, RestSharp) Git/github Linux (bash, cron) Job Requirements What we need from you: A passion for technology and bringing your visions to reality through code and leveraging state of the art technologies As a lead, you will take ownership of issues and challenges and will also be a proactive and effective communicator; this role requires successful verbal and written communication to many different audiences inside and outside of Careerbuilder Demonstrated ability to earn your teammates' trust and respect through clear, honest, and helpful communication We prefer you to have proven leadership experience, but also be a hands on, passionate coder BS in Computer Science or related field (preferred but not required) What you will receive: When you're focused on the goal, not the path - you can be more flexible, and that translates into more productive and satisfied employees. From flexible hours to volunteering during work hours to diverse education opportunities, CareerBuilder.com is committed to helping employees strike a balance. Training that positions you to continuously grow with ongoing learning and development courses; we never stop investing in our people. Summer Hours! Enjoy 1/2 day paid Fridays during Summer Hours Quarterly 24 hour Hackathons and bi-weekly personal development time to learn new skills Paid volunteer time and coordinated opportunities to give back to the community Bagel Fridays! Casual Dress Code and laid back environment; don't worry about buying new suits and dry cleaning bills! Comprehensive Medical, Dental & Vision Programs Education Reimbursement Program allowing up to $5k per year towards completion of a Bachelor's and non-MBA graduate degree, and up to $10K per year towards completion of an MBA! No strings attached! $400 Annual Reimbursement for Wellness Activities, including your gym membership! 401(k) Program with Strong Employer Match and 2 year vesting schedule! Five Star Company Paid Trips for top performers, pack your bags and get ready to experience luxury! CareerBuilder, LLC is proud to be an Equal Opportunity Employer. Applicants are considered for all positions without regard to race, color, religion, sex, national origin, age, disability, sexual orientation, ancestry, marital or veteran status."
"Quality Engineer TSS is currently seeking Quality Engineer for Industrial Manufacturer in the London, KY area. Qualified candidates must have experience in Quality Engineering or related degree. Job Requirements Directs sampling inspection, and testing of produced/received parts, components and materials to determine conformance to standards. Host customers for audits, react to customer complaints, follow through on all sorting and rework of suspect parts. Control of the product sorting/hold areas of the facility. Responsible for directing, instructing and organizing the work of parts sort area. Must follow-up with efficiency, effectiveness and safety of those assigned to work the area. Provides training and completes documentation of all quality training provided to Company employees and forwarding that paperwork to the appropriate individuals (Supervisors, Engineering, Human Resources, etc.). Develop PPAP documentation for specific products; including Quality Control Plans, Flowcharts, FMEA’s, Inspection Reports, measurement/calculations coordination and PSW. Acts as Internal Auditor Coordinator and oversees the maintenance of all TS 16949 documentation. Applies statistical process control (SPC) methods for analyzing data to evaluate the current process and process changes. Works with supervisors and other responsible persons on determining root cause and developing corrective actions for all internal quality concerns. Participate in APQP for specific programs. Communicate with the customer as necessary to ensure all issues around assigned programs are resolved in a timely manner. Respond to customer corrective Action Requests. Develop gauging requirements for assigned programs. Monitor process capability to ensure required standards are maintained. Participate in Continuous Improvement programs. Perform workstation audits on assigned programs. Perform vendor quality audits as required. Prepares and presents technical and program information to team members and management. Accepts responsibility for subordinates?activities; Solicits and applies customer feedback (internal and external); Fosters quality focus in others. Provides computerized status report describing progress and concerns related to inspection activities, nonconforming items, and/or other items related to the quality of the process, material, or product. Reviews quality trends, tracks the root cause of problems, and coordinates correction actions. Provides input and recommendations to management on process of procedural system improvements, such as configuration management and operations functions. Work with technicians to ensure products are measured correctly and all data is compiled for on-time PPAP submissions. Will document and review supplier quality issues to the quality files daily, and communicate any needed Corrective Actions or plans from the suppliers. Formulates contingency plans, reviews control plans and FMEAs and makes necessary updates to the database as needed. Responsibilities include training; assigning and directing work of temporary re-work employees. All other duties as assigned. Training: TS 16949 Documentation: APQP, PPAP, FMEA, MSA Internal Auditing Education Requirements: College degree or equivalent experience as determined by the Quality Manager. Skills: To perform this job successfully, an individual must be able to perform each essential job functions satisfactory. The duties and responsibilities listed above are representative of the knowledge, skill and/or ability required for the position. Excellent verbal and written skills: Proficient in computer software including Word, Excel, Access: Strong leadership skills: Good problem solving skills; Communicate well with others at all levels. Experience: To perform this position successfully, an individual should have a minimum of three (3) years in related field. "
An I try to test this code:
library(tidytext)
library(stringr)
reg <- "([^A-Za-z_\\d##']|'(?![A-Za-z_\\d##]))"
tidy_df <- df %>%
filter(!str_detect(text, "^RT")) %>%
mutate(text = str_replace_all(text,
"https://t.co/[A-Za-z\\d]+|http://[A-Za-z\\d]+|&|<|>|RT|https",
"")) %>%
unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word,
str_detect(word, "[a-z]"))
But I receive this error:
Error in stri_detect_regex(string, pattern, opts_regex = opts(pattern)) :
argument `str` should be a character vector (or an object coercible to)
Is there any problem with the input data and this error happens? What can I do to fix it?
You forgot to load dplyr (library(dplyr)). This causes R to use stats::filter() rather than dplyr::filter(). The former function has a different signature and does not expose free_text to the inner str_detect().
I am developing a school management system and I am struggling whether I should develop for [desktop app + wcf] or web app(website). Which one is going to be the best for the given scenario?
The main goals for the “Integrated Web-Based School Management and Quality Audits Software Project for Secondary Schools are outlined below. In addition, specific objectives within each of the goals have been provided.
Goal 1: To facilitate automated data entries in secondary schools
Objective 1:1- To provide internet facilities and computer systems for secondary schools to further facilitate entries of student information into an integrated school management system.
Objective 1:2- To provide teachers with the possibility to enter continuous assessment data into the computer systems for each student.
Objective 1:3- To provide teachers with the possibility to enter end of term results for each student.
Objective 1:4:- To provide teachers with the possibility to enter students’ conduct at end of term for each student
Objective 1:5:- To provide the administration office with the possibility to register new students into the system
Objective 1:6:- To provide finance/fees office with the possibility to enter fees information for each student
Objective 1:7:- To provide parents with the possibility to access their children’s information online and provide feedback when needed or requested to do so
Goal 2:- To generate a portfolio of student information in respect of each student. A unique student identification will be used to access each student’s portfolio. The following are the main components of the portfolio.
Objective 2:1:- One of the components of the students’ portfolio page will be the Result Slip of the immediate last examination term. This will display all subjects taken by the student, continuous assessment results, examination results, grades and positions obtained in each subject, overall student position, student’s conduct and recommendation information. This report will automatically be gathered from the various inputs made from the individual teachers and staff
Objective 2:2:- Up to date historical record of Fees Information. This is vital information that will be available on each student portal. All fees due and all payments made that are entered by the fees/finance staff will be gathered by this component of the portfolio. Parents will be able to see this as well and provide feedback on any observed discrepancies.
Objective 2:3:- Attendance and Conduct report. This component of the portfolio is intended to give an account of the student’s attendance records and information on conduct as provided by the school authorities. If the information demands parent’s attention and feedback, this will be indicated here, and parents will be able to enter relevant feedback as requested.
Objective 2:4:- Completed and Pending Assignments Module. This component of the student’s portfolio will list all assignments completed by the student in the current term and will list uncompleted ones as well.
Goal 3:- To generate aggregated data for the management of the school. This will enable the school management have a high-level overview of student population, performance statistics for all the modules in the various classes, aggregated data on fees paid and fees pending, etc. There will hyperlinks or select options from which authorized staff will click or select from, in order to reach the requested aggregated data. Main components of the management page are listed below,
Objective 3:1:- One of the components of the staff portfolio page will be the Population Statistics. This will indicate total number of students, which is expandable to also list number of males and number of females. This can further be expanded to list female and male students in the various classes
Objective 3:2:- Performance Overview is another component of the staff portfolio. This will provide a high level overview of students’ performance. Per each class and for each subject, this module will list the number of Grade A students, Grade B students and so on and so forth. This links can be further expanded to view the number of males and females who obtained the various grades in the various classes. This module will also compare grades obtained in one subject with another to give an overview of modules that students do very well with those that they do not, to help management take quick action to rectify any anomalies
Objective 3:3:- Fees Overview is another component of the staff portfolio. This will provide fees information in the form of total fees paid within a specific period (Selectable from term, year, previous year(s), all years until current term, etc.). This information can be further expanded to show fees owed per class, payments overdue and allow the fees office to generate generic reminder messages in the form of email or text messages to parents of students who are overdue.
Reading through those requirements, it sounds like this is more than one application.
Undoubtedly you need some sort of web application (probably ASP.NET in some form?) to allow the parents of students to asses their children's records.
However for security purposes this same application should probably not be used for teachers and administrative staff to edit these records. Those functions should be on a protected LAN, and require more application security for viewing or editing any potentially sensitive data (especially financial records).
I don't see where WCF would fit into this, unless you need to provide some web service support to some other system? Or perhaps proving some "application server" on a protected LAN that can use WCF to serve data to 2 separate applications for outside / public access (from separate web servers in a DMZ) and one for internal users.
There isn't really 1 answer to this question.
You said "I am struggling whether I should develop for [desktop app + wcf] or web app(website)", but it sounds like you need to develop the [desktop app + wcf] anyway because the school administration is already using some sort of desktop application to update the data. You also need a web application for the parents to view their children's record. If you can, I strongly suggest you skip the wcf and just do a web application. At my current job, there's something similar to what rally25rs describe, and it is a pain in the ass to maintain the desktop application, the asp.net website and the wcf service business logic. But it sounds like you have no choice, so good luck!