473,322 Members | 1,408 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,322 software developers and data experts.

SearchCrawler Error, Kindly provide me solution

SearchCrawler.java

The program search crawler used to search the files from the website.
From the following program i got 7 compiler error. can any body clarify it and provide me solution.
Expand|Select|Wrap|Line Numbers
  1. import java.awt.*;
  2. import java.awt.event.*;
  3. import java.io.*;
  4. import java.net.*;
  5. import java.util.*;
  6. import java.util.regex.*;
  7. import javax.swing.*;
  8. import javax.swing.table.*;
  9. // The Search Web Crawler
  10. public class SearchCrawler extends JFrame
  11. {
  12.   // Max URLs drop-down values.
  13.   private static final String[] MAX_URLS =
  14.     {"50", "100", "500", "1000"};
  15.   // Cache of robot disallow lists.
  16.   private HashMap disallowListCache = new HashMap();
  17.  
  18.   // Search GUI controls.
  19.   private JTextField startTextField;
  20.   private JComboBox maxComboBox;
  21.   private JCheckBox limitCheckBox;
  22.   private JTextField logTextField;
  23.   private JTextField searchTextField;
  24.   private JCheckBox caseCheckBox;
  25.   private JButton searchButton;
  26.  
  27.   // Search stats GUI controls.
  28.   private JLabel crawlingLabel2;
  29.   private JLabel crawledLabel2;
  30.   private JLabel toCrawlLabel2;
  31.   private JProgressBar progressBar;
  32.   private JLabel matchesLabel2;
  33.  
  34.   // Table listing search matches.
  35.   private JTable table;
  36.   // Flag for whether or not crawling is underway.
  37.   private boolean crawling;
  38.   // Matches log file print writer.
  39.   private PrintWriter logFileWriter;
  40.   // Constructor for Search Web Crawler.
  41.   public SearchCrawler()
  42.   {
  43.     // Set application title.
  44.     setTitle("Search Crawler");
  45.     // Set window size.
  46.     setSize(600, 600);
  47.      // Handle window closing events.
  48.     addWindowListener(new WindowAdapter() {
  49.      public void windowClosing(WindowEvent e) {
  50.        actionExit();
  51.      }
  52.     });
  53.     // Set up File menu.
  54.     JMenuBar menuBar = new JMenuBar();
  55.     JMenu fileMenu = new JMenu("File");   
  56.     fileMenu.setMnemonic(KeyEvent.VK_F);
  57.     JMenuItem fileExitMenuItem = new JMenuItem("Exit",
  58.       KeyEvent.VK_X);
  59.     fileExitMenuItem.addActionListener(new ActionListener() {
  60.       public void actionPerformed(ActionEvent e) {  
  61.         actionExit();
  62.       }
  63.     });
  64.     fileMenu.add(fileExitMenuItem);
  65.     menuBar.add(fileMenu);
  66.     setJMenuBar(menuBar);
  67.     // Set up search panel.
  68.     JPanel searchPanel = new JPanel();
  69.     GridBagConstraints constraints;
  70.     GridBagLayout layout = new GridBagLayout();  
  71.     searchPanel.setLayout(layout);
  72.     JLabel startLabel = new JLabel("Start URL:"); 
  73.     constraints = new GridBagConstraints();
  74.     constraints.anchor = GridBagConstraints.EAST;   
  75.     constraints.insets = new Insets(5, 5, 0, 0);  
  76.     layout.setConstraints(startLabel, constraints); 
  77.     searchPanel.add(startLabel);
  78.     startTextField = new JTextField();
  79.     constraints = new GridBagConstraints();  
  80.     constraints.fill = GridBagConstraints.HORIZONTAL; 
  81.     constraints.gridwidth = GridBagConstraints.REMAINDER; 
  82.     constraints.insets = new Insets(5, 5, 0, 5);  
  83.     layout.setConstraints(startTextField, constraints); 
  84.     searchPanel.add(startTextField);
  85.     JLabel maxLabel = new JLabel("Max URLs to Crawl:");  
  86.     constraints = new GridBagConstraints(); 
  87.     constraints.anchor = GridBagConstraints.EAST; 
  88.     constraints.insets = new Insets(5, 5, 0, 0); 
  89.     layout.setConstraints(maxLabel, constraints); 
  90.     searchPanel.add(maxLabel);
  91.     maxComboBox = new JComboBox(MAX_URLS);  
  92.     maxComboBox.setEditable(true);
  93.     constraints = new GridBagConstraints();  
  94.     constraints.insets = new Insets(5, 5, 0, 0);  
  95.     layout.setConstraints(maxComboBox, constraints); 
  96.     searchPanel.add(maxComboBox);
  97.     limitCheckBox =
  98.       new JCheckBox("Limit crawling to Start URL site"); 
  99.     constraints = new GridBagConstraints(); 
  100.     constraints.anchor = GridBagConstraints.WEST; 
  101.     constraints.insets = new Insets(0, 10, 0, 0); 
  102.     layout.setConstraints(limitCheckBox, constraints); 
  103.     searchPanel.add(limitCheckBox);
  104.     JLabel blankLabel = new JLabel();
  105.     constraints = new GridBagConstraints(); 
  106.     constraints.gridwidth = GridBagConstraints.REMAINDER; 
  107.     layout.setConstraints(blankLabel, constraints);  
  108.     searchPanel.add(blankLabel);
  109.     JLabel logLabel = new JLabel("Matches Log File:"); 
  110.     constraints = new GridBagConstraints(); 
  111.     constraints.anchor = GridBagConstraints.EAST; 
  112.     constraints.insets = new Insets(5, 5, 0, 0);
  113.     layout.setConstraints(logLabel, constraints); 
  114.     searchPanel.add(logLabel);
  115.     String file =
  116.       System.getProperty("user.dir") +
  117.       System.getProperty("file.separator") +
  118.       "crawler.log";
  119.     logTextField = new JTextField(file);
  120.     constraints = new GridBagConstraints();  
  121.     constraints.fill = GridBagConstraints.HORIZONTAL; 
  122.     constraints.gridwidth = GridBagConstraints.REMAINDER; 
  123.     constraints.insets = new Insets(5, 5, 0, 5); 
  124.     layout.setConstraints(logTextField, constraints); 
  125.     searchPanel.add(logTextField);
  126.     JLabel searchLabel = new JLabel("Search String:"); 
  127.     constraints = new GridBagConstraints(); 
  128.     constraints.anchor = GridBagConstraints.EAST;   
  129.     constraints.insets = new Insets(5, 5, 0, 0); 
  130.     layout.setConstraints(searchLabel, constraints); 
  131.     searchPanel.add(searchLabel);
  132.     searchTextField = new JTextField();
  133.     constraints = new GridBagConstraints(); 
  134.     constraints.fill = GridBagConstraints.HORIZONTAL; 
  135.     constraints.insets = new Insets(5, 5, 0, 0); 
  136.     constraints.gridwidth= 2;
  137.     constraints.weightx = 1.0d;
  138.     layout.setConstraints(searchTextField, constraints); 
  139.     searchPanel.add(searchTextField);
  140.     caseCheckBox = new JCheckBox("Case Sensitive"); 
  141.     constraints = new GridBagConstraints(); 
  142.     constraints.insets = new Insets(5, 5, 0, 5); 
  143.     constraints.gridwidth = GridBagConstraints.REMAINDER; 
  144.     layout.setConstraints(caseCheckBox, constraints); 
  145.     searchPanel.add(caseCheckBox);
  146.     searchButton = new JButton("Search"); 
  147.     searchButton.addActionListener(new ActionListener() {
  148.       public void actionPerformed(ActionEvent e) {
  149.         actionSearch();
  150.       }
  151.     });
  152.     constraints = new GridBagConstraints();
  153.     constraints.gridwidth = GridBagConstraints.REMAINDER; 
  154.     constraints.insets = new Insets(5, 5, 5, 5); 
  155.     layout.setConstraints(searchButton, constraints); 
  156.     searchPanel.add(searchButton);
  157.     JSeparator separator = new JSeparator();
  158.     constraints = new GridBagConstraints(); 
  159.     constraints.fill = GridBagConstraints.HORIZONTAL; 
  160.     constraints.gridwidth = GridBagConstraints.REMAINDER; 
  161.     constraints.insets = new Insets(5, 5, 5, 5); 
  162.     layout.setConstraints(separator, constraints); 
  163.     searchPanel.add(separator);
  164.     JLabel crawlingLabel1 = new JLabel("Crawling:"); 
  165.     constraints = new GridBagConstraints(); 
  166.     constraints.anchor = GridBagConstraints.EAST; 
  167.     constraints.insets = new Insets(5, 5, 0, 0); 
  168.     layout.setConstraints(crawlingLabel1, constraints); 
  169.     searchPanel.add(crawlingLabel1);
  170.     crawlingLabel2 = new JLabel();
  171.     crawlingLabel2.setFont(
  172.       crawlingLabel2.getFont().deriveFont(Font.PLAIN)); 
  173.     constraints = new GridBagConstraints(); 
  174.     constraints.fill = GridBagConstraints.HORIZONTAL; 
  175.     constraints.gridwidth = GridBagConstraints.REMAINDER; 
  176.     constraints.insets = new Insets(5, 5, 0, 5); 
  177.     layout.setConstraints(crawlingLabel2, constraints); 
  178.     searchPanel.add(crawlingLabel2);
  179.     JLabel crawledLabel1 = new JLabel("Crawled URLs:"); 
  180.     constraints = new GridBagConstraints(); 
  181.     constraints.anchor = GridBagConstraints.EAST; 
  182.     constraints.insets = new Insets(5, 5, 0, 0); 
  183.     layout.setConstraints(crawledLabel1, constraints); 
  184.     searchPanel.add(crawledLabel1);
  185.     crawledLabel2 = new JLabel();
  186.     crawledLabel2.setFont(
  187.       crawledLabel2.getFont().deriveFont(Font.PLAIN)); 
  188.     constraints = new GridBagConstraints(); 
  189.     constraints.fill = GridBagConstraints.HORIZONTAL; 
  190.     constraints.gridwidth = GridBagConstraints.REMAINDER; 
  191.     constraints.insets = new Insets(5, 5, 0, 5);
  192.     layout.setConstraints(crawledLabel2, constraints); 
  193.     searchPanel.add(crawledLabel2);
  194.     JLabel toCrawlLabel1 = new JLabel("URLs to Crawl:"); 
  195.     constraints = new GridBagConstraints(); 
  196.     constraints.anchor = GridBagConstraints.EAST; 
  197.     constraints.insets = new Insets(5, 5, 0, 0); 
  198.     layout.setConstraints(toCrawlLabel1, constraints); 
  199.     searchPanel.add(toCrawlLabel1);
  200.     toCrawlLabel2 = new JLabel();
  201.     toCrawlLabel2.setFont(
  202.       toCrawlLabel2.getFont().deriveFont(Font.PLAIN)); 
  203.     constraints = new GridBagConstraints(); 
  204.     constraints.fill = GridBagConstraints.HORIZONTAL; 
  205.     constraints.gridwidth = GridBagConstraints.REMAINDER; 
  206.     constraints.insets = new Insets(5, 5, 0, 5); 
  207.     layout.setConstraints(toCrawlLabel2, constraints); 
  208.     searchPanel.add(toCrawlLabel2);
  209.     JLabel progressLabel = new JLabel("Crawling Progress:");
  210.     constraints = new GridBagConstraints(); 
  211.     constraints.anchor = GridBagConstraints.EAST; 
  212.     constraints.insets = new Insets(5, 5, 0, 0); 
  213.     layout.setConstraints(progressLabel, constraints); 
  214.     searchPanel.add(progressLabel);
  215.     progressBar = new JProgressBar(); 
  216.     progressBar.setMinimum(0);
  217.     progressBar.setStringPainted(true);
  218.     constraints = new GridBagConstraints(); 
  219.     constraints.fill = GridBagConstraints.HORIZONTAL; 
  220.     constraints.gridwidth = GridBagConstraints.REMAINDER; 
  221.     constraints.insets = new Insets(5, 5, 0, 5); 
  222.     layout.setConstraints(progressBar, constraints); 
  223.     searchPanel.add(progressBar);
  224.     JLabel matchesLabel1 = new JLabel("Search Matches:"); 
  225.     constraints = new GridBagConstraints(); 
  226.     constraints.anchor = GridBagConstraints.EAST; 
  227.     constraints.insets = new Insets(5, 5, 10, 0); 
  228.     layout.setConstraints(matchesLabel1, constraints); 
  229.     searchPanel.add(matchesLabel1);
  230.     matchesLabel2 = new JLabel();
  231.     matchesLabel2.setFont(
  232.       matchesLabel2.getFont().deriveFont(Font.PLAIN)); 
  233.     constraints = new GridBagConstraints(); 
  234.     constraints.fill = GridBagConstraints.HORIZONTAL; 
  235.     constraints.gridwidth = GridBagConstraints.REMAINDER; 
  236.     constraints.insets = new Insets(5, 5, 10, 5); 
  237.     layout.setConstraints(matchesLabel2, constraints); 
  238.     searchPanel.add(matchesLabel2);
  239.     // Set up matches table.
  240.     table =
  241.       new JTable(new DefaultTableModel(new Object[][]{},
  242.         new String[]{"URL"}) {
  243.       public boolean isCellEditable(int row, int column)
  244.       {
  245.         return false;
  246.       }
  247.     });
  248.     // Set up Matches panel.
  249.     JPanel matchesPanel = new JPanel(); 
  250.     matchesPanel.setBorder(
  251.       BorderFactory.createTitledBorder("Matches")); 
  252.     matchesPanel.setLayout(new BorderLayout()); 
  253.     matchesPanel.add(new JScrollPane(table),
  254.       BorderLayout.CENTER);
  255.     // Add panels to display.
  256.     getContentPane().setLayout(new BorderLayout()); 
  257.     getContentPane().add(searchPanel, BorderLayout.NORTH); 
  258.     getContentPane().add(matchesPanel,BorderLayout.CENTER);
  259.   }
  260.   // Exit this program.
  261.   private void actionExit() {
  262.     System.exit(0);
  263.   }
  264.   // Handle Search/Stop button being clicked.
  265.   private void actionSearch() {
  266.     // If stop button clicked, turn crawling flag off.
  267.     if (crawling) {
  268.       crawling = false;
  269.       return;
  270.   }
  271.   ArrayList errorList = new ArrayList();
  272.   // Validate that start URL has been entered.
  273.   String startUrl = startTextField.getText().trim();
  274.   if (startUrl.length() < 1) {
  275.     errorList.add("Missing Start URL.");
  276.   }
  277.   // Verify start URL.
  278.   else if (verifyUrl(startUrl) == null) {
  279.     errorList.add("Invalid Start URL.");
  280.   }
  281.   // Validate that Max URLs is either empty or is a number.
  282.   int maxUrls = 0;
  283.   String max = ((String) maxComboBox.getSelectedItem()).trim();
  284.   if (max.length() > 0) {
  285.     try {
  286.       maxUrls = Integer.parseInt(max);
  287.     } catch (NumberFormatException e) {
  288.     }
  289.     if (maxUrls < 1) {
  290.       errorList.add("Invalid Max URLs value.");
  291.     }
  292.   }
  293.   // Validate that matches log file has been entered.  
  294.   String logFile = logTextField.getText().trim();
  295.   if (logFile.length() < 1) {
  296.     errorList.add("Missing Matches Log File.");
  297.   }
  298.   // Validate that search string has been entered.
  299.   String searchString = searchTextField.getText().trim(); 
  300.   if (searchString.length() < 1) {
  301.     errorList.add("Missing Search String.");
  302.   }
  303.   // Show errors, if any, and return.
  304.   if (errorList.size() > 0) {
  305.     StringBuffer message = new StringBuffer();
  306.     // Concatenate errors into single message.
  307.     for (int i = 0; i < errorList.size(); i++) {
  308.       message.append(errorList.get(i));
  309.       if (i + 1 < errorList.size()) {
  310.         message.append("\n");
  311.       }
  312.     }
  313.     showError(message.toString());
  314.     return;
  315.   }
  316.   // Remove "www" from start URL if present.
  317.   startUrl = removeWwwFromUrl(startUrl);
  318.   // Start the Search Crawler.
  319.   search(logFile, startUrl, maxUrls, searchString);
  320. }
  321.  
  322. private void search(final String logFile, final String startUrl,
  323.   final int maxUrls, final String searchString)
  324. {
  325.   // Start the search in a new thread.
  326.   Thread thread = new Thread(new Runnable() {
  327.     public void run() {
  328.       // Show hour glass cursor while crawling is under way.
  329.       setCursor(Cursor.getPredefinedCursor(Cursor.WAIT_CURSOR));
  330.       // Disable search controls.
  331.       startTextField.setEnabled(false); 
  332.       maxComboBox.setEnabled(false); 
  333.       limitCheckBox.setEnabled(false); 
  334.       logTextField.setEnabled(false); 
  335.       searchTextField.setEnabled(false); 
  336.       caseCheckBox.setEnabled(false);
  337.       // Switch Search button to "Stop." 
  338.       searchButton.setText("Stop");
  339.       // Reset stats.
  340.       table.setModel(new DefaultTableModel(new Object[][]{},
  341.         new String[]{"URL"}) {
  342.         public boolean isCellEditable(int row, int column)
  343.         {
  344.           return false;
  345.         }
  346.       });
  347.        updateStats(startUrl, 0, 0, maxUrls);
  348.       // Open matches log file.
  349.       try {
  350.         logFileWriter = new PrintWriter(new FileWriter(logFile));
  351.       } catch (Exception e) {
  352.         showError("Unable to open matches log file.");  
  353.         return;
  354.       }
  355.       // Turn crawling flag on.
  356.       crawling = true;
  357.       // Perform the actual crawling.
  358.       crawl(startUrl, maxUrls, limitCheckBox.isSelected(), 
  359.         searchString, caseCheckBox.isSelected());
  360.       // Turn crawling flag off.
  361.       crawling = false;
  362.       // Close matches log file.
  363.       try {
  364.         logFileWriter.close();
  365.       } catch (Exception e) {
  366.         showError("Unable to close matches log file.");
  367.       }
  368.       // Mark search as done.
  369.       crawlingLabel2.setText("Done");
  370.       // Enable search controls.
  371.       startTextField.setEnabled(true); 
  372.       maxComboBox.setEnabled(true); 
  373.       limitCheckBox.setEnabled(true); 
  374.       logTextField.setEnabled(true); 
  375.       searchTextField.setEnabled(true); 
  376.       caseCheckBox.setEnabled(true);
  377.       // Switch search button back to "Search." 
  378.       searchButton.setText("Search");
  379.       // Return to default cursor.
  380.       setCursor(Cursor.getDefaultCursor());
  381.       // Show message if search string not found.
  382.       if (table.getRowCount() == 0) {
  383.         JOptionPane.showMessageDialog(SearchCrawler.this, 
  384.           "Your Search String was not found. Please try another.",
  385.           "Search String Not Found", 
  386.           JOptionPane.WARNING_MESSAGE);
  387.       }
  388.     }
  389.   });
  390.   thread.start();
  391. }
  392. // Show dialog box with error message.
  393. private void showError(String message) {
  394.   JOptionPane.showMessageDialog(this, message, "Error",  
  395.     JOptionPane.ERROR_MESSAGE);
  396. }
  397. // Update crawling stats.
  398. private void updateStats(
  399.   String crawling, int crawled, int toCrawl, int maxUrls)
  400. {
  401.   crawlingLabel2.setText(crawling);
  402.   crawledLabel2.setText("" + crawled); 
  403.   toCrawlLabel2.setText("" + toCrawl);
  404.   // Update progress bar.
  405.   if (maxUrls == -1) {
  406.     progressBar.setMaximum(crawled + toCrawl);
  407.   } else {
  408.     progressBar.setMaximum(maxUrls);
  409.   }
  410.   progressBar.setValue(crawled);
  411.   matchesLabel2.setText("" + table.getRowCount());
  412. }
  413. // Add match to matches table and log file.
  414. private void addMatch(String url) {
  415.   // Add URL to matches table.
  416.   DefaultTableModel model =
  417.     (DefaultTableModel) table.getModel();
  418.   model.addRow(new Object[]{url});
  419.   // Add URL to matches log file.
  420.   try {
  421.     logFileWriter.println(url);
  422.   } catch (Exception e) {
  423.     showError("Unable to log match.");
  424.   }
  425. }
  426. // Verify URL format.
  427. private URL verifyUrl(String url) {
  428.   // Only allow HTTP URLs.
  429.   if (!url.toLowerCase().startsWith("http://"))
  430.     return null;
  431.   // Verify format of URL.
  432.   URL verifiedUrl = null;
  433.   try {
  434.     verifiedUrl = new URL(url);
  435.   } catch (Exception e) {
  436.     return null;
  437.   }
  438.   return verifiedUrl;
  439. }
  440. // Check if robot is allowed to access the given URL. private boolean isRobotAllowed(URL urlToCheck) {
  441.   String host = urlToCheck.getHost().toLowerCase();
  442.   // Retrieve host's disallow list from cache.
  443.   ArrayList disallowList =(ArrayList) disallowListCache.get(host);
  444.   // If list is not in the cache, download and cache it.
  445.   if (disallowList == null) {
  446.     disallowList = new ArrayList();
  447.     try {
  448.       URL robotsFileUrl =
  449.         new URL("http://" + host + "/robots.txt");
  450.       // Open connection to robot file URL for reading. 
  451.       BufferedReader reader =
  452.         new BufferedReader(new InputStreamReader(
  453.           robotsFileUrl.openStream()));
  454.       // Read robot file, creating list of disallowed paths.
  455.       String line;
  456.       while ((line = reader.readLine()) != null) {
  457.         if (line.indexOf("Disallow:") == 0) {
  458.           String disallowPath =         line.substring("Disallow:".length());
  459.           // Check disallow path for comments and remove if present.
  460.           int commentIndex = disallowPath.indexOf("#");
  461.           if (commentIndex != -1) {
  462.             disallowPath =             disallowPath.substring(0, commentIndex);
  463.           }
  464.           // Remove leading or trailing spaces from disallow path.
  465.           disallowPath = disallowPath.trim();
  466.           // Add disallow path to list.
  467.           disallowList.add(disallowPath);
  468.         }
  469.       }
  470.       // Add new disallow list to cache.
  471.       disallowListCache.put(host, disallowList);
  472.     }
  473.     catch (Exception e) {
  474.       /* Assume robot is allowed since an exception
  475.          is thrown if the robot file doesn't exist. */  
  476.       return true;
  477.     }
  478.   }
  479.   /* Loop through disallow list to see if
  480.      crawling is allowed for the given URL. */
  481.   String file = urlToCheck.getFile();
  482.   for (int i = 0; i < disallowList.size(); i++) {
  483.     String disallow = (String) disallowList.get(i);
  484.     if (file.startsWith(disallow)) {
  485.       return false;
  486.     }
  487.   }
  488.   return true;
  489. }
  490. // Download page at given URL.
  491. private String downloadPage(URL pageUrl) {
  492.   try {
  493.     // Open connection to URL for reading.
  494.     BufferedReader reader =      new BufferedReader(new InputStreamReader(
  495.         pageUrl.openStream()));
  496.     // Read page into buffer.
  497.     String line;
  498.     StringBuffer pageBuffer = new StringBuffer();
  499.     while ((line = reader.readLine()) != null) {
  500.       pageBuffer.append(line);
  501.     }
  502.     return pageBuffer.toString();
  503.   } catch (Exception e) {
  504.   }
  505.   return null;
  506. }
  507. // Remove leading "www" from a URL's host if present.
  508. private String removeWwwFromUrl(String url) {
  509.   int index = url.indexOf("://www.");
  510.   if (index != -1) {
  511.     return url.substring(0, index + 3) +
  512.       url.substring(index + 7);
  513.   }
  514.   return (url);
  515. }
  516. // Parse through page contents and retrieve links.
  517. private ArrayList retrieveLinks(
  518.   URL pageUrl, String pageContents, HashSet crawledList,  
  519.   boolean limitHost)
  520. {
  521.   // Compile link matching pattern.
  522.   Pattern p =
  523.     Pattern.compile("<a\\s+href\\s*=\\s*\"?(.*?)[\"|>]", 
  524.       Pattern.CASE_INSENSITIVE);
  525.   Matcher m = p.matcher(pageContents);
  526.   // Create list of link matches.
  527.   ArrayList linkList = new ArrayList();
  528.   while (m.find()) {
  529.     String link = m.group(1).trim();
  530.     // Skip empty links.
  531.     if (link.length() < 1) {
  532.       continue;
  533.     }
  534.     // Skip links that are just page anchors.
  535.     if (link.charAt(0) == '#') {
  536.       continue;
  537.     }
  538.     // Skip mailto links.
  539.     if (link.indexOf("mailto:") != -1) {
  540.       continue;
  541.     }
  542.     // Skip JavaScript links.
  543.     if (link.toLowerCase().indexOf("javascript") != -1) {
  544.       continue;
  545.     }
  546.     // Prefix absolute and relative URLs if necessary.
  547.     if (link.indexOf("://") == -1) {
  548.       // Handle absolute URLs.
  549.       if (link.charAt(0) == '/') {
  550.         link = "http://" + pageUrl.getHost() + link;
  551.       // Handle relative URLs.
  552.       } else {
  553.         String file = pageUrl.getFile();
  554.         if (file.indexOf('/') == -1) {
  555.           link = "http://" + pageUrl.getHost() + "/" + link;
  556.         } else {
  557.           String path =
  558.             file.substring(0, file.lastIndexOf('/') + 1);  
  559.           link = "http://" + pageUrl.getHost() + path + link;
  560.         }
  561.       }
  562.     }
  563.     // Remove anchors from link.
  564.     int index = link.indexOf('#');
  565.     if (index != -1) {
  566.       link = link.substring(0, index);
  567.     }
  568.     // Remove leading "www" from URL's host if present.  
  569.     link = removeWwwFromUrl(link);
  570.     // Verify link and skip if invalid.
  571.     URL verifiedLink = verifyUrl(link);
  572.     if (verifiedLink == null) {
  573.       continue;
  574.     }
  575.     /* If specified, limit links to those
  576.       having the same host as the start URL. */
  577.     if (limitHost &&
  578.         !pageUrl.getHost().toLowerCase().equals(
  579.           verifiedLink.getHost().toLowerCase()))  
  580.     {
  581.       continue;
  582.     }
  583.     // Skip link if it has already been crawled.
  584.     if (crawledList.contains(link)) {
  585.       continue;
  586.     }
  587.     // Add link to list.
  588.     linkList.add(link);
  589.   }
  590.   return (linkList);
  591. }
  592. /* Determine whether or not search string is
  593.    matched in the given page contents. */
  594. private boolean searchStringMatches(
  595.   String pageContents, String searchString,
  596.   boolean caseSensitive)
  597. {
  598.   String searchContents = pageContents;
  599.   /* If case-sensitive search, lowercase
  600.      page contents for comparison. */
  601.   if (!caseSensitive) {
  602.     searchContents = pageContents.toLowerCase();
  603.   }
  604.   // Split search string into individual terms.
  605.   Pattern p = Pattern.compile("[\\s]+");
  606.   String[] terms = p.split(searchString);
  607.   // Check to see if each term matches.
  608.   for (int i = 0; i < terms.length; i++) {
  609.     if (caseSensitive) {
  610.       if (searchContents.indexOf(terms[i]) == -1) {
  611.         return false;
  612.       }
  613.     } else {
  614.       if (searchContents.indexOf(terms[i].toLowerCase()) == -1) {
  615.         return false;
  616.       }
  617.     }
  618.   }
  619.   return true;
  620. }
  621. // Perform the actual crawling, searching for the search string.
  622. public void crawl(
  623.   String startUrl, int maxUrls, boolean limitHost,
  624.   String searchString, boolean caseSensitive)
  625. {
  626.   // Set up crawl lists.
  627.   HashSet crawledList = new HashSet();
  628.   LinkedHashSet toCrawlList = new LinkedHashSet();
  629.   // Add start URL to the to crawl list.
  630.   toCrawlList.add(startUrl);
  631.   /* Perform actual crawling by looping
  632.     through the To Crawl list. */
  633.   while (crawling && toCrawlList.size() > 0)
  634.   {
  635.     /* Check to see if the max URL count has
  636.        been reached, if it was specified.*/
  637.     if (maxUrls != -1) {
  638.       if (crawledList.size() == maxUrls) {
  639.         break;
  640.     }
  641.   }
  642.   // Get URL at bottom of the list.
  643.   String url = (String) toCrawlList.iterator().next();
  644.   // Remove URL from the To Crawl list.
  645.   toCrawlList.remove(url);
  646.   // Convert string url to URL object.
  647.   URL verifiedUrl = verifyUrl(url);
  648.   // Skip URL if robots are not allowed to access it.
  649.   if (!isRobotAllowed(verifiedUrl)) {
  650.     continue;
  651.   }
  652.   // Update crawling stats.
  653.   updateStats(url, crawledList.size(), toCrawlList.size(), 
  654.     maxUrls);
  655.   // Add page to the crawled list.
  656.   crawledList.add(url);
  657.   // Download the page at the given URL.
  658.   String pageContents = downloadPage(verifiedUrl);
  659.   /* If the page was downloaded successfully, retrieve all its
  660.      links and then see if it contains the search string. */
  661.   if (pageContents != null && pageContents.length() > 0)
  662.   {
  663.     // Retrieve list of valid links from page.
  664.     ArrayList links =
  665.       retrieveLinks(verifiedUrl, pageContents, crawledList,
  666.         limitHost);
  667.     // Add links to the To Crawl list.
  668.     toCrawlList.addAll(links);
  669.     /* Check if search string is present in
  670.        page, and if so, record a match. */
  671.     if (searchStringMatches(pageContents, searchString,  
  672.          caseSensitive))
  673.     {
  674.       addMatch(url);
  675.     }
  676.   }
  677.   // Update crawling stats.
  678.      updateStats(url, crawledList.size(), toCrawlList.size(),
  679.        maxUrls);
  680.     }
  681.   }
  682.   // Run the Search Crawler.
  683.   public static void main(String[] args) {
  684.     SearchCrawler crawler = new SearchCrawler();
  685.     crawler.show();
  686.   }
  687. }
  688.  
-------------------------------------------------------------
After compilation
-------------------------------------------------------------
Expand|Select|Wrap|Line Numbers
  1. SearchCrawler.java:445: illegal start of type
  2.   if (disallowList == null) {
  3.  
  4.   ^
  5. SearchCrawler.java:481: <identifier> expected
  6.   String file = urlToCheck.getFile();
  7.                                     ^
  8. SearchCrawler.java:482: illegal start of type
  9.   for (int i = 0; i < disallowList.size(); i++) {
  10.   ^
  11. SearchCrawler.java:488: <identifier> expected
  12.   return true;
  13.              ^
  14. SearchCrawler.java:491: 'class' or 'interface' expected
  15. private String downloadPage(URL pageUrl) {
  16.         ^
  17. SearchCrawler.java:687: 'class' or 'interface' expected
  18. }
  19. ^
  20. SearchCrawler.java:689: 'class' or 'interface' expected
  21. ^
  22. 7 errors
Oct 6 '08 #1
2 2443
r035198x
13,262 8TB
1.) Please use code tags when posting code.
2.) Post the error messages you got here as well.
Oct 6 '08 #2
r035198x
13,262 8TB
1.) Please use code tags when posting code.
2.) Post the error messages you got here as well.
I see now that you did post the error messages (See the ggod thing about code tags?)

Check your braces.
Oct 6 '08 #3

Sign in to post your reply or Sign up for a free account.

Similar topics

67
by: Steven T. Hatton | last post by:
Some people have suggested the desire for code completion and refined edit-time error detection are an indication of incompetence on the part of the programmer who wants such features. ...
7
by: Jos De Laender | last post by:
Following simple program : #include <stdio.h> char *Convert(int Arg) { static char Buffer; printf("Convert called for %d\n",Arg); sprintf(Buffer,"%d",Arg); printf("Convert about returning...
1
by: vstl | last post by:
I get an error while trying to run the application. For example when i type the url http://localhost/MyApplication/login.aspx and hit enter i get an error message as given below. Kindly note that...
9
by: Karthikeyan.T.S | last post by:
Hi, I am getting a error when I try to start a Windows Service. The error is "The XYZ service on local computer started and then stopped.Some services stop automatically if they have no work to...
4
by: Steve Barnett | last post by:
I copied and paste a form in my solution and renamed the copy (all done in the solution explorer) and now, when I compile the app, I get the following error: ----------- The item...
1
by: Chris | last post by:
Hi, I am working on a small which involves a touchscreen and I need to create a on-screen keyboard. I need some assistance with a few questions 1. For the the buttons, do I user their text as...
5
by: Bjorn Sagbakken | last post by:
Hello I have just migrated from VS 2003 to VS 2005, and .NET framework 1.1 to 2.0 I am at the end of debugging and fixing stuff. Now there is one error I just cannot find a solution to: On...
10
by: subramanian100in | last post by:
consider the following program: #include <iostream> using namespace std; class my_complex { public: friend ostream & operator<<(ostream &os, const my_complex &c);
10
by: =?Utf-8?B?QklKVQ==?= | last post by:
When I am executing a Asp.Net web page containg Tree View web control, I got an error message "File or assembly name microsoft.web.UI.Web controls or one of its dependence, was not found". Kindly...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.