Problems while trying to add an xml file to Alfresco - alfresco

Am facing an issue with Alfresco and honestly am not expert with this type of technology:
the idea is to add an xml file under a folder
the code is like that:
//with the static values are:
public static final String SUSPENDRE_DESUSPENDRE_CONTENT_NAME = "suspendreDesuspendre";
private static final String SUSPENDRE_DESUSPENDRE_CONTENT_TYPE = "text/xml";
private static final String SUSPENDRE_DESUSPENDRE_CONTENT_ENCODING = "UTF-8";
private static final ContentFormat SUSPENDRE_DESUSPENDRE_CONTENT_FORMAT = new ContentFormat(SUSPENDRE_DESUSPENDRE_CONTENT_TYPE,SUSPENDRE_DESUSPENDRE_CONTENT_ENCODING);
private static final byte[] SUSPENDRE_DESUSPENDRE_CONTENT_INITIAL_BYTES = "<?xml //version=\"1.0\" encoding=\"UTF-8\"?><suspendreDesuspendre></suspendreDesuspendre>".getBytes();
#Override
public void createOrUpdateHisSuspendre(ContractBean contractbean,SuspendreDesuspendreEntree suspendreDesuspendreEntree) throws Exception
{
String parentUuid=contractbean.getUuid();
contractDAO.createAlfrescoContent(parentUuid, SUSPENDRE_DESUSPENDRE_CONTENT_NAME, SUSPENDRE_DESUSPENDRE_CONTENT_INITIAL_BYTES, SUSPENDRE_DESUSPENDRE_CONTENT_FORMAT);
}
public Reference createAlfrescoContent(String folderUuid, String contentName,byte[] contentBytes,ContentFormat contentFormat)throws RepositoryFault, RemoteException {
ParentReference parentReference = new ParentReference(new Store(Constants.WORKSPACE_STORE, "SpacesStore"), folderUuid, null, Constants.ASSOC_CONTAINS, "{" + Constants.NAMESPACE_CONTENT_MODEL + "}" + contentName);
NamedValue[] properties = new NamedValue[]{Utils.createNamedValue(Constants.PROP_NAME, contentName)};
CMLCreate create = new CMLCreate("1", parentReference, null, null, null,
Constants.TYPE_CONTENT, properties);
CML cml = new CML();
cml.setCreate(new CMLCreate[]{create});
UpdateResult[] result = WebServiceFactory.getRepositoryService().update(cml);
Reference newContentNode = result[0].getDestination();
Content content = WebServiceFactory.getContentService().write(newContentNode, Constants.PROP_CONTENT, contentBytes, contentFormat);
return content.getNode();
}
the error is:
The association source type is incorrect:
Source Node: workspace://SpacesStore/d4ffbff4-6bd6-4945-948e-2c16c1990cb9
Association: Association[ class=ClassDef[name={http://www.alfresco.org/model/content/1.0}folder], name={http://www.alfresco.org/model/content/1.0}contains, target class={http://www.alfresco.org/model/system/1.0}base, source role=null, target role=null]
Required Source Type: {http://www.alfresco.org/model/content/1.0}folder
Actual Source Type: {com.genia.cnas.alfresco.model}contratDefenseur

Related

Spring Boot MVC -> Excel data corrupted on download

I am using ModelAndView pattern to return excel representation of data that is generated in the Controller using Apache POI library.
However the excel gets corrupted(special characters are replaced with ?) when it gets downloaded. If I write the excel to file before pushing it out on the HTTP response, then a valid excel is output.
Here is the controller code that pushes control to ModelAndView
Map<String, Object> model = new HashMap<String, Object>();
model.put(ExcelBusinessReportView.KEY_REPORT_DISPLAY_DATA, reportData);
model.put(ExcelBusinessReportView.KEY_REPORT_DATE, reportRequestDTO.getReportDateUTCAtMidnight());
return new ModelAndView("excelBusinessReportView", model);
And here is the view class
#Service(value = "excelBusinessReportView")
public class ExcelBusinessReportView extends AbstractXlsView {
public static final String KEY_REPORT_DISPLAY_DATA = "reportData";
public static final String KEY_REPORT_DATE = "reportDate";
private static final String MIME_TYPE_EXCEL = "application/ms-excel";
private static final String HEADER_VALUE_CONTENT_DISPOSITION = "attachment; filename=qup_report.xls";
private static final String[] SUMMARY_HEADERS = ........
private static final String[] DETAIL_HEADERS = ........
#Override
protected void buildExcelDocument(Map<String, Object> model, Workbook workbook, HttpServletRequest request,
HttpServletResponse response) throws Exception {
BusinessSlotReportResource reportDisplayData = (BusinessSlotReportResource) model.get(KEY_REPORT_DISPLAY_DATA);
DateTime reportDate = (DateTime) model.get(KEY_REPORT_DATE);
// Build excel document
Sheet sheet = workbook.createSheet(reportDate.toString(CommonConstants.IST_DATE_FORMATTER_PATTERN));
sheet.setDefaultColumnWidth((short) 12);
Integer currentRow = 0;
// Build summary data
currentRow = this.buildSummaryData(workbook, sheet, reportDisplayData, currentRow);
// Create margin rows
sheet.createRow(currentRow++);
sheet.createRow(currentRow++);
// Build detail data
this.buildDetailsData(workbook, sheet, reportDisplayData, currentRow);
response.setContentType(MIME_TYPE_EXCEL);
response.setHeader(HttpHeaders.CONTENT_DISPOSITION, HEADER_VALUE_CONTENT_DISPOSITION);
}
Content of excel when written to file in the view
–œ‡°±·;˛ˇ ˛ˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇRoot Entryˇˇˇˇˇˇˇˇ#Workbookˇˇˇˇˇˇˇˇˇˇˇˇ˛ˇˇˇ˝ˇˇˇ˛ˇˇˇ ˛ˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ
 !"#$%&'()*+,-./0˛ˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ ”ÃA·∞¡‚\panilallewar
Same part of the excel when downloaded
��ࡱ�;�� ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������Root Entry��������#Workbook������������������������ ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
 !"#$%&'()*+,-./0�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������� ��A����\panilallewar
Usually instead of putting the file in the model I write it directly in the response.
The following code is for xlsx format, but the concept is the same for previous versions of excel.
This endpoint is accepting a JSON which will be mapped to MyPojo.
#RequestMapping(value = "exportToExcel", method = RequestMethod.POST)
public #ResponseBody HttpEntity<byte[]> generateExcel(#Valid #RequestBody final MyPojo data) throws IOException {
final File file = File.createTempFile("MyExcelReport", "xlsx");
file.deleteOnExit();
final Path path = file.toPath();
try (final FileOutputStream fileOut = new FileOutputStream(file)) {
try (final XSSFWorkbook workbook = new XSSFWorkbook()) {
final XSSFSheet sheet = workbook.createSheet(SHEET_NAME);
//fill your excel sheets
workbook.write(fileOut);
final byte[] byteArray = Files.readAllBytes(path);
final HttpHeaders header = new HttpHeaders();
header.setContentType(new MediaType("application", "vnd.openxmlformats-officedocument.spreadsheetml.sheet"));
header.set("Content-Disposition", "inline; filename=MyExcelReport.xlsx");
header.setContentLength(byteArray.length);
return new HttpEntity<>(byteArray, header);
} catch (final Exception e) {
LOG.error("Error during creation of excel report", e);
throw e;
} finally {
if (path != null) {
try {
Files.delete(path);
} catch (final IOException e) {
LOG.error("Unable to delete file:" + path.toString(), e);
}
}
}
}
}
Also if you are using a frontend framework like angular you have to properly setup the response type (https://stackoverflow.com/a/52703842/3657208)

Is it possible to create a folder on the Alfresco site by using OpenCMIS API?

I have the Presentation Web Script (script A) and the Data Web Script (script B).
In the script A I build the dialog that interacts with the script B.
Here I am forming some path where the some file will be uploaded (group, year and number parameters define this path):
...
var submitHandler = function() {
var dataWebScriptUrl = window.location.protocol + '//' +
window.location.host + "/alfresco/s/ms-ws/script-b?guest=true";
var yearCombo = document.getElementById("year");
var year = yearCombo.options[yearCombo.selectedIndex].value;
var groupCombo = document.getElementById("group");
var group = groupCombo.options[groupCombo.selectedIndex].value;
var numberCombo = document.getElementById("number");
var number = numberCombo.value;
var uploadedFile = document.getElementById("uploadedFile");
var file = uploadedFile.files[0];
var formData = new FormData();
formData.append("year", year);
formData.append("group", group);
formData.append("number", number);
formData.append("uploadedFile", file);
var xhr = new XMLHttpRequest();
xhr.open("POST", dataWebScriptUrl);
xhr.send(formData);
};
...
In script B, I'm using the Apache Chemistry OpenCMIS API to create a path in the CMIS-compatible Alfresco repository:
public class CustomFileUploader extends DeclarativeWebScript implements OpenCmisConfig {
...
private void retrievePostRequestParams(WebScriptRequest req) {
String groupName = null, year = null, number = null;
FormData formData = (FormData) req.parseContent();
FormData.FormField[] fields = formData.getFields();
for(FormData.FormField field : fields) {
String fieldName = field.getName();
String fieldValue = field.getValue();
if(fieldName.equalsIgnoreCase("group")) {
if(fieldValue.equalsIgnoreCase("services")) {
groupName = "Услуги";
...
}
firstLevelFolderName = "/" + groupName;
secondLevelFolderName = groupName + " " + year;
thirdLevelFolderName = number;
}
...
Folder firstLevelFolder =
createFolderIfNotExists(cmisSession, docLibFolder, firstLevelFolderName);
...
private Folder createFolderIfNotExists(Session cmisSession,
Folder parentFolder, String folderName) {
Folder subFolder = null;
for(CmisObject child : parentFolder.getChildren()) {
if(folderName.equalsIgnoreCase(child.getName())) {
subFolder = (Folder) child;
}
}
if(subFolder == null) {
Map<String, Object> props = new HashMap<>();
props.put("cmis:objectTypeId", "cmis:folder");
props.put("cmis:name", folderName);
subFolder = parentFolder.createFolder(props);
}
return subFolder;
}
private Folder getDocLibFolder(Session cmisSession, String siteName) {
String path = "/Sites/" + siteName + "/documentLibrary";
return (Folder) cmisSession.getObjectByPath(path);
}
private Session getCmisSession() {
SessionFactory factory = SessionFactoryImpl.newInstance();
Map<String, String> conf = new HashMap<>();
// http://localhost:8080/alfresco/api/-default-/public/cmis/versions/1.1/atom
conf.put(SessionParameter.ATOMPUB_URL, ATOMPUB_URL);
conf.put(SessionParameter.BINDING_TYPE, BindingType.ATOMPUB.value());
conf.put(SessionParameter.USER, USER_NAME);
conf.put(SessionParameter.PASSWORD, PASSWORD);
// "org.alfresco.cmis.client.impl.AlfrescoObjectFactoryImpl"
conf.put(SessionParameter.OBJECT_FACTORY_CLASS, OBJECT_FACTORY_CLASS);
conf.put(SessionParameter.REPOSITORY_ID, "-default-");
Session session = factory.createSession(conf);
return session;
}
...
It's all works well... But I need to create the directory structure on a specific site, e.g. "contracts-site", here:
/site/contracts-site/documentlibrary
When I specifying the following:
/Sites/contracts-site/documentLibrary/Услуги
/Sites/contracts-site/Услуги
/site/contracts-site/documentlibrary/Услуги
I get the following exception (depending on the path):
org.apache.chemistry.opencmis.commons.exceptions.CmisObjectNotFoundException: Object not found: /Sites/contracts-site/Услуги
When I specifying the following:
"/Услуги"
Everything works, but the directory structure is created outside the site...
How to create a folder on the Alfresco site by using OpenCMIS API?
Arn't you missing /company_home/ ?
This would lead to
/company_home/Sites/contracts-site/documentLibrary/Услуги
Just accidentally found the solution. Works perfectly if specify the following path:
// locate the document library
String path = "/Сайты/contracts-site/documentLibrary";
Ie, "Сайты" instead of "Sites"... (Cyrillic alphabet)
I'm using ru_RU locale and UTF-8 encoding. Then this example also works.

How to get csv file noderef by using the file name or file path in alfresco webscript

I am very new in alfresco, and i am struggling in reading csv file from alfresco repository.
I have one CSV file in alfresco repository, data dictionary folder.
now my requirement is to get the noderef of the csv file and read the all data from csv file.
Can anyone please help me with this.
You can use Lucene query to get document using name like
1)alfresco Node Browser Query : +PATH:"/app:company_home//* " AND +#cm\:name:"abc.txt"
2)alfresco share javascript query : var nodes = nodes = search.luceneSearch("#cm\\:name:\"abc.txt\"");
3)alfresco java backed webscript query :
String query ="PATH:\"/app:company_home//*\"";
query+="#cm\\:name:abc.txt";
StoreRef storeRef = new StoreRef(StoreRef.PROTOCOL_WORKSPACE,"SpacesStore");
ResultSet results = searchService.query(storeRef, SearchService.LANGUAGE_LUCENE, query);
for (ResultSetRow row : results) {
NodeRef currentNodeRef = row.getNodeRef();
lista.add(currentNodeRef.toString());
}
make changes in query as per your requrement.
The below code will return the noderef as you are expecting from the folder/file path.
public static void main(String[] args) throws IOException {
client = new AlfrescoClient.Builder().connect("http://localhost:8080/alfresco", "admin", "admin").build();
NodeRepresentation node = findNodeByPath("Sites/testsite/documentLibrary/TestFolder/UploadTest.txt");
System.out.println(node == null ? "null" : node.getName());
}
private static NodeRepresentation findNodeByPath(String path) throws IOException {
if (path.startsWith("/"))
path = path.substring(1);
if (path.endsWith("/"))
path = path.substring(0, path.length() - 1);
return findNodeByPath("-root-", path);
}
private static NodeRepresentation findNodeByPath(String parentNodeId, String path) throws IOException {
String[] pathParts = path.split("/");
String name = pathParts[0];
String remaining = pathParts.length == 1 ? "" : path.substring(name.length() + 1, path.length());
List<NodeRepresentation> children = client.getNodesAPI().listNodeChildrenCall(parentNodeId).execute().body()
.getList();
for (NodeRepresentation child : children) {
if (child.getName().equals(name)) {
return pathParts.length == 1 ? child : findNodeByPath(child.getId(), remaining);
}
}
return null;

I am getting null for the one of the column value after extraction. What's wrong with the following program?

I am getting null for one of the selected column with IterableCSVToBean<MessageFileExtractHeader>
DTO Classe:
public class MessageFileExtractHeader implements Serializable {
private static final long serialVersionUID = -3052197544136826142L;
private String mesgid;
private String mesg_type;
// getters and setters
Main Class:
public class FileExtraction {
public static void main(String[] args) throws IOException, IllegalAccessException, InvocationTargetException, InstantiationException, IntrospectionException, CsvBadConverterException, CsvDataTypeMismatchException, CsvRequiredFieldEmptyException, CsvConstraintViolationException {
Properties prop = new Properties();
ExtractFieldUtils efUtils= new ExtractFieldUtils();
MessageFileExtractHeader msgFilxtractRecord = null;
try {
InputStream inputStream =
SAADumpFileExtraction.class.getClassLoader().getResourceAsStream("config.properties");
prop.load(inputStream);
} catch (IOException e) {
e.printStackTrace();
}
String fileDirectory= prop.getProperty("file.directory");
//get the filenames
String mesgfilename= fileDirectory+prop.getProperty("mesg.file.name");
//get the headers
String mesgheader= fileDirectory+prop.getProperty("mesg.file.header.fields");
int msgskiplines=1;
CSVReader reader = null;
try {
reader = new CSVReader(new FileReader(mesgfilename));
Map<String, String> msgmapping = efUtils.getMapping(mesgheader);
HeaderColumnNameTranslateMappingStrategy<MessageFileExtractHeader> strategy = new HeaderColumnNameTranslateMappingStrategy<MessageFileExtractHeader>();
strategy.setType(MessageFileExtractHeader.class);
strategy.setColumnMapping(msgmapping);
IterableCSVToBean<MessageFileExtractHeader> msgCTBIterator= new IterableCSVToBean<MessageFileExtractHeader>(reader, strategy, null);
Iterator<MessageFileExtractHeader> mesgIterator= msgCTBIterator.iterator();
while(mesgIterator.hasNext()){
msgFilxtractRecord = mesgIterator.next();
System.out.println(msgFilxtractRecord);
//
}} finally {
reader.close();
}
}
}
Output:
MessageFileExtractHeaders [mesgid=null, mesg_type=081]
Please suggest me good solution to get the mesgid.
Please send an short sample of your csv file (header and one line) and the value of your header property.
My guess is there is a type in either the csv header, the headers in the property or both and it does not match what is in the DTO (mesgid). Because of that it will not be populated.

crawler4j compile error with class CrawlConfig - VariableDeclaratorId Expected

The code will not compile. I changed the JRE to 1.7. The compiler does not highlight the class in Eclipse and the CrawlConfig appears to fail in the compiler. The class should be run from the command line in Linux.
Any ideas?
Compiler Error -
Description Resource Path Location Type
Syntax error on token "crawlStorageFolder", VariableDeclaratorId expected after this token zeocrawler.java /zeowebcrawler/src/main/java/com/example line 95 Java Problem
import edu.uci.ics.crawler4j.crawler.CrawlConfig;
import edu.uci.ics.crawler4j.crawler.CrawlController;
import edu.uci.ics.crawler4j.crawler.Page;
import edu.uci.ics.crawler4j.crawler.WebCrawler;
import edu.uci.ics.crawler4j.fetcher.PageFetcher;
import edu.uci.ics.crawler4j.parser.HtmlParseData;
import edu.uci.ics.crawler4j.robotstxt.RobotstxtConfig;
import edu.uci.ics.crawler4j.robotstxt.RobotstxtServer;
import edu.uci.ics.crawler4j.url.WebURL;
public class Controller {
String crawlStorageFolder = "/data/crawl/root";
int numberOfCrawlers = 7;
CrawlConfig config = new CrawlConfig();
config.setCrawlStorageFolder(crawlStorageFolder);
PageFetcher pageFetcher = new PageFetcher(config);
RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);
controller.addSeed("http://www.senym.com");
controller.addSeed("http://www.merrows.co.uk");
controller.addSeed("http://www.zeoic.com");
controller.start(MyCrawler.class, numberOfCrawlers);
}
public URLConnection connectURL(String strURL) {
URLConnection conn =null;
try {
URL inputURL = new URL(strURL);
conn = inputURL.openConnection();
int test = 0;
}catch(MalformedURLException e) {
System.out.println("Please input a valid URL");
}catch(IOException ioe) {
System.out.println("Can not connect to the URL");
}
return conn;
}
public static void updatelongurl()
{
// System.out.println("Short URL: "+ shortURL);
// urlConn = connectURL(shortURL);
// urlConn.getHeaderFields();
// System.out.println("Original URL: "+ urlConn.getURL());
/* connectURL - This function will take a valid url and return a
URL object representing the url address. */
}
public class MyCrawler extends WebCrawler {
private Pattern FILTERS = Pattern.compile(".*(\\.(css|js|bmp|gif|jpe?g"
+ "|png|tiff?|mid|mp2|mp3|mp4"
+ "|wav|avi|mov|mpeg|ram|m4v|pdf"
+ "|rm|smil|wmv|swf|wma|zip|rar|gz))$");
/**
* You should implement this function to specify whether
* the given url should be crawled or not (based on your
* crawling logic).
*/
#Override
public boolean shouldVisit(WebURL url) {
String href = url.getURL().toLowerCase();
return !FILTERS.matcher(href).matches() && href.startsWith("http://www.ics.uci.edu/");
}
/**
* This function is called when a page is fetched and ready
* to be processed by your program.
*/
#Override
public void visit(Page page) {
String url = page.getWebURL().getURL();
System.out.println("URL: " + url);
if (page.getParseData() instanceof HtmlParseData) {
HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
String text = htmlParseData.getText();
String html = htmlParseData.getHtml();
List<WebURL> links = htmlParseData.getOutgoingUrls();
System.out.println("Text length: " + text.length());
System.out.println("Html length: " + html.length());
System.out.println("Number of outgoing links: " + links.size());
}
}
}
This is a pretty strange error since the code seems to be clean. Try to start eclipse with the -clean option on command line.
Change
String crawlStorageFolder = "/data/crawl/root";
to
String crawlStorageFolder = "./data/crawl/root";
i.e. add a leading .

Resources