473,467 Members | 1,481 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

doubt on reliability of Java REGEX

539 Contributor
Good day

Is it safe to use java regex to extract links and email addresses of a specific webpage?(this is not intended for malicious activity)

We are on the final stage in college, and we have already proposed a project that needs an extraction tool, we decided to create our own(we still have 3 weeks to decide whether to hardcode it or to use 3rd party libraries for parsing html)

Im doubt to choose whether to use the java regular expression or other 3rd party html parsers due to some articles like this one.
Aug 23 '09 #1

✓ answered by JosAH

@sukatoa
You're welcome of course and good luck with your project.

kind regards,

Jos

3 2017
JosAH
11,448 Recognized Expert MVP
@sukatoa
Regular expressions in Java are reliable but limited to regular languages. HTML isn't a regular language. I use the word 'regular' in a theoretical way. One of the limitations is that regular languages (and so regular expressions) can not find nested structures easily. You need full fledged parsers for that. The SAX parser implemented in Java is such a parser. Email addresses don't contain nested structures and can be parsed by regular expressions. Google for some of them, there are a lot around on the net.

My advice would be to use a SAX parser to retrieve the possible pieces of text out of an HTML text and parse those pieces with a regular expression to determine whether or not it actually can be a valid email address.

kind regards,

Jos
Aug 23 '09 #2
sukatoa
539 Contributor
Thanks for your advice Jos :)
Aug 23 '09 #3
JosAH
11,448 Recognized Expert MVP
@sukatoa
You're welcome of course and good luck with your project.

kind regards,

Jos
Aug 23 '09 #4

Sign in to post your reply or Sign up for a free account.

Similar topics

6
by: John Smith | last post by:
Hello, I have a rather odd question. My company is an all java/oracle shop. We do everything is Java... no matter what it is... parsing of text files, messaging, gui you name it. My question...
1
by: pawel | last post by:
I have made some comparision C# to Java RegularExpression. The problem was to find out if the rule match some text. Matching were done for precompiled regular expressions, in 100000 iterations...
34
by: Ville Voipio | last post by:
I would need to make some high-reliability software running on Linux in an embedded system. Performance (or lack of it) is not an issue, reliability is. The piece of software is rather simple,...
3
by: gimme_this_gimme_that | last post by:
I'm driving Windows XP and could use a tip on installing a function written in Java. 1. I created the a Java jar file, named UdfUtils.jar, from the Java source file shown below: 2. I copied the...
8
by: gimme_this_gimme_that | last post by:
I have the following Java code : package com.rhi.bb.udf.utils; import java.sql.Clob; import java.sql.SQLException; import java.util.regex.Pattern; import java.util.regex.Matcher;
4
by: Michael A. Covington | last post by:
While asking some Java enthusiasts what they think about C#, I came across this: http://www.manageability.org/blog/archive/20030520%23p_the_problem_with_cameron Reportedly, the (essentially)...
0
by: vmysore | last post by:
I am trying to get all the columns selected within a SQL query (including the sub selects). When the code hits matcher.find(). i get the following exception: Exception in thread "main"...
4
by: sukatoa | last post by:
This was my first time to encouter this kind of exception.... that exception appears when i invoked the the method below. private final String encrypting(String enc){ int...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.