JAVA正则表达式学习2

字体大小: 中小 标准 ->行高大小: 标准
实例三:
数据提取 
要求:从一段HTML代码中提取出所有的email地址和<a href...>tag中的链接地址

public class HtmlTest { 
public static void main(String[] args) { 
String htmlText = "<html>" 
+ "<a href=\"testone@163.com\">163test</a>\n" 
+ "<a href='www.163.com@163-com.com'>163news</a>\n" 
+ "<a href=http://www.163.com>163lady</a>\n" 
+ "<a href = http://sports.163.com>网易体育</a>\n" 
+ "<a href = \"http://gz.house.163.com\">网易房产</a>\n" 
+ ".leemaster@163" + "luckdog.com" + "</html>"; 
System.out.println("开始检查email"); 
for (String email : extractEmail(htmlText)) { 
System.out.println("邮箱是:" + email); 
} 
System.out.println("开始检查超链接"); 
for (String link : extractLink(htmlText)) { 
System.out.println("超链接是:" + link); 
} 
} 
private static List<String> extractLink(String htmlText) { 
List<String> result = new ArrayList<String>(); 
Pattern p = Pattern.compile(Regexes.HREF_LINK_REGEX); 
Matcher m = p.matcher(htmlText); 
while (m.find()) { 
result.add(m.group()); 
} 
return result; 
} 
private static List<String> extractEmail(String htmlText) { 
List<String> result = new ArrayList<String>(); 
Pattern p = Pattern.compile(Regexes.EMAIL_REGEX); 
Matcher m = p.matcher(htmlText); 
while (m.find()) { 
result.add(m.group()); 
} 
return result; 
} 
} 
public class Regexes { 
public static final String EMAIL_REGEX = 
"(?i)(?<=\\b)[a-z0-9][-a-z0-9_.]+[a-z0-9]@([a-z0-9][-a-z0-9]+\\.)+[a-z]{2,4}(?=\\b)"; 
public static final String HREF_LINK_REGEX 
= "(?i)<a\\s+href\\s*=\\s*['\"]?([^'\"\\s>]+)['\"\\s>]"; 
} 
运行结果: 
开始检查email 
邮箱是:testone@163.com 
邮箱是:www.163.com@163-com.com 
邮箱是:leemaster@163luckdog.com 
开始检查超链接 
超链接是:<a href="testone@163.com" 
超链接是:<a href='www.163.com@163-com.com' 
超链接是:<a href=http://www.163.com> 
超链接是:<a href = http://sports.163.com> 
超链接是:<a href = "http://gz.house.163.com"

实例四: 
查找重复单词 
要求:查找一段文本中是否存在重复单词,如果存在,去掉重复单词。 
public class FindWord { 
public static void main(String[] args) { 
String[] sentences = new String[] { "this is a normal sentence", 
"Oh,my god!Duplicate word word", 
"This sentence contain no duplicate word words" }; 
for(String sentence:sentences){ 
System.out.println("校验句子:"+sentence); 
if(containDupWord(sentence)){ 
System.out.println("Duplicate word found!!"); 
System.out.println("正在去除重复单词"+removeDupWords(sentence)); 
} 
System.out.println(""); 
} 
} 
private static String removeDupWords(String sentence) { 
String regex = Regexes.DUP_WORD_REGEX; 
return sentence.replaceAll(regex,"$1"); 
} 
private static boolean containDupWord(String sentence) { 
String regex = Regexes.DUP_WORD_REGEX; 
Pattern p = Pattern.compile(regex); 
Matcher m = p.matcher(sentence); 
if(m.find()){ 
return true; 
}else{ 
return false; 
} 
} 
} 
public class Regexes { 
public static final String DUP_WORD_REGEX 
= "(?<=\\b)(\\w+)\\s+\\1(?=\\b)"; 
}
运行结果:
校验句子:this is a normal sentence
校验句子:Oh,my god!Duplicate word word
Duplicate word found!!
正在去除重复单词Oh,my god!Duplicate word
校验句子:This sentence contain no duplicate word words

此文章由 http://www.ositren.com 收集整理 ,地址为: http://www.ositren.com/htmls/67849.html