Cut Up A Text Into Sentences With Regex?
Aug 28, 2011
I need to cut up paragraphs into their constituent sentences. I also want to calculate with abreviations and decimals containing dots in the middle of the sentences.
My simplistic definition: A sentence starts with a capital letter, finishes with (.?!) followed by a space and the capital letter of the next sentence, plus something that is NOT (.?!) I came up with the following negative look- ahead solution that works on decimals, but fails on abreviations.
[A-Z]((?![.?!]s+[A-Z][^.?!]).)+
The negative look-ahead should fail - and come back with true- till it arrives at the sentence ending position in my definition, but it doesn't..
Example: Just after daybreak in Nags Head on the Outer Banks, about 200 miles northeast of Jacksonville, winds 85.43 miles / hour whipped heavy rain across the resort town. Tall waves covered what had been the beach, and the surf pushed as high as the backs of some of the N.Y. dt. houses and hotels fronting the strand. Lights flickered in one hotel, but the power was still on.
View 2 Replies
ADVERTISEMENT
Aug 30, 2011
I have found the solution to select ANY whole English sentence reliably regardless of quotation marks, or even punctuation marks used inside them for abreviations, decimals or whatever other purposes! Tests reliably on any non-accented string!
["']?[A-Z][^.?!]+((?![.?!]['"]?s["']?[A-Z][^.?!]).)+[.?!'"]+
EXPLANATION:
In human language it reads as follows:
Find a non-accented capital letter that might be preceeded by a quotation mark and check that it is not directly followed by any punctuation marks to exclude capital letter abbreviations inside sentences. Then crawl forward by repeating a group consisting of a negative look-ahead and the universal selector character until you arrive at the end of the sentence you are in. You will know you are there if you find the sequence of a possible quotation mark - the one closing its pair at the start of your sentence, followed by the sentence- closing punctuation mark and the white space that neccessarily separates your sentence from the next one. Then you repeat the criteria for the start of a sence to see it's already a new one! Because of the negative condition in the look-ahead the repeated group - the universal selector really - did not choose the closing punctuation mark + the possible quotation mark, so you should care for these separately.
SUGGESTION FOR FURTHER DEVELOPMENT: Together with the starting non-accented capital letters you can also use hexadecimal notations to describe accented ANSI capital letters to select sentences in any other European languages. But this is not an issue for me at the present..
View 9 Replies
View Related
Mar 26, 2009
how to find certain text within two specified text? In my example I have:
<asset:search
type=�asset type�
[subtype=�asset subtype�][code]...
I need to find through pages of code looking for "localfields=" between "<asset:search" and "/>".
View 5 Replies
View Related
May 21, 2009
I need to parse an an HTML page and pull what ever values are in these JavaScript tags. There will usually be multiple tags with different values between the single quotes. The value in the next example I need to pull into my array would be 'A728'. Here is an example code..
View 2 Replies
View Related
Apr 17, 2010
I'm recieving an HTML piece from wich I should get all the TD tags, for example, I recieve the following:
<tr>
<td>name</td> <td>surname</td>
</tr>
Then I must look for "<td>[anything]</td>" with a regex and and I build an array containing the text inside the tags, like
tags[0] = "name";
tags[1] = "surname";
So... I did this:
Code:
html = "<tr><td>name</td> <td>surname</td></tr>";
var reg = new RegExp('<td[^>]*>(.*)</td>', 'gim');
var matches = html.match(reg);
The problem is that I'm getting just ONE array element with the value:
"<td>name</td> <td>surname</td>"
instead of two values (on for name and another for surname).I testes a lot of different regex and also some string methods but I cannot make it work
View 3 Replies
View Related
Sep 15, 2005
I am trying to use a regEx to grab Function names and function parameter names from a text entry.
The script is written in javascript and I expect the functions to be in javascript syntax.
For example the code might look like:
Code:
function myFunction1(param1,param2,param3){
some code
}
function myFunction2();
function myFunction3(param);
Whats the best way to accomplish grabbing the function names and parameters?
Should I be breaking it down into multiple regular expressions?
View 5 Replies
View Related
Jul 23, 2005
I have a function which validates the password if there is a number:
-------------------------------------------------
function findNumeric(str_obj){
regEx = /d/;
if (str_obj.match(regEx))
return true;
else
return false;
}
--------------------------------------------------
The problem arises when I put a password with a space in between e.g:
'test test1'. The fucntion returns false. I've tried 's' in the
regEx but the user can put the space anywhere..
Any idea how to solve this problem as I should be able to put any
alplanumeric value into the password, including space.
View 6 Replies
View Related
Jul 23, 2005
I have a variable named "acct". I first want to remove any "-" characters
from it's value. After this I want to verify that we have only exactly 12
digits in the variable.
Unfortunately I'm pretty green as far as using RegEx.
/d{12}/.test(acct); should do the second part, but how do I do the first?
View 4 Replies
View Related
Jan 9, 2006
Basically i want to get the current url, and then replace http:// with
something else.
Here is the current code.
var current_url = window.document.location;
var re = new RegExp("http://", "g");
if(re.test(current_url)) {
me = current_url.replace(re,"http://www.addme.com/");
window.alert("found :: " + me + " :: " + current_url);
} else {
window.alert("not");}
if my page was http://ww.google.com 'd get the alert to be:
found :: undefined :: http://www.google.com.
I dont understand why i am getting undefined. When re.test() works.
surely that means the regex is correct.
View 14 Replies
View Related
Jun 15, 2006
Trying to match the entire following object literal code using a RegEx.
var Punctuators = { '{' : 'LeftCurly', '}' : 'RightCurly' }
Variations on the idea of using /var.*{.*}/ of course stops at the
first }.
View 7 Replies
View Related
May 9, 2007
I was using the following code:
element.value = element.value.replace(/ /g,'');
to remove all the spaces in a string.
However in IE6 it complained with and "Expected ')'" error.
How can I tell IE6 to replace just spaces (i.e. not using s)?
I tried / / and /[ ]/ but neither of them worked either.
View 13 Replies
View Related
May 18, 2007
I need to strip everything from a file except what is between <body>
and </body>
View 2 Replies
View Related
Oct 13, 2009
ok heres a regex
/^(?=.*d)(?=.*[A-Z])(?=.*[a-z]).{8,16}$/
which checks I have at least one lowercase letter, one uppercase letter and one number and the string is between 8 and 16 characters.I have adapted this from another source and it works as intended on all browsers but not IE7 or IE6 (oh microsoft why do you make my life so hard)This works fine in all other browsers (IE8 is fine) but doesnt work in IE6 or IE7
View 1 Replies
View Related
Oct 23, 2005
I'm writing an ECMAScript tokeniser and parser and trying to find out if I can eliminate the switching from tokenising "/" as start of regex or the division operator depending on the parser feedback - essentially, if I can make the tokeniser independent of the parser. (I have a gut feeling this needs too much special casing to be worth it). Code:
View 2 Replies
View Related
Jun 27, 2010
I have been playing with this regex for a few hours now I want to make it so it accepts commas also.
At the moment it works with A-z and - . ' but can't seem to figure out how to include commas.
View 4 Replies
View Related
Jun 21, 2011
I have a bunch of text that I want to split into an array of sentences. I have the following code that works just fine on FF and Chromium, but ofc has to fail on the pile of *** that is IE [code]...
It does not produce any errors, but the resulting array often has empty strings as value instead of the sentences that should be there. how to do this in a way it also works on IE?
View 1 Replies
View Related
Oct 27, 2004
i have the following regex:
(s*{s*(<?)s*(>?)s*}s*)
this needs to be able to match a string and make the following replacements:
if the string matches without < or >, replace the match with a space, a replacement string, and another space. if < matches also, do not add the left space. if > matches, do not add the right space. if < and > match, do not add the beginning or ending space
Old {} String => Old Replacement String
Old {<} String => OldReplacement String
Old {>} String => Old ReplacementString
Old {<>} String => OldReplacementString
this will have to be done a LOT of times, so efficiency is very important the answer in php is below. can anyone help me figure out how to do it in javascript? PHP Code:
View 4 Replies
View Related
Dec 21, 2004
In have a string of data like so:
<div id="feedback">
<p>[DEC 12th Anthony]I like it[DEC 12th Anthony]I agree</p>
</div>
I'm trying to use regex to add a <br /> before each item in hard brackets so the comments are broken out. Here's what I've tried.
re = /(.*])/gi;
vTemp = aSourceObject.innerHTML.replace(re,"<br />$1");
What I end up getting is:
<div id="feedback">
<p><br />[DEC 12th Anthony]I like it[DEC 12th Anthony]I agree</p>
</div>
It gets it right, but only for the first item, not the second one. If I tell it to put the <br /> after then I get
<p>[DEC 12th Anthony]I like it[DEC 12th Anthony]<br />I agree</p>
So its like its reading the entire section in brackets as one match instead of 2 seperate matches..
View 2 Replies
View Related
Nov 15, 2006
I need information about javascript & regular expression.please suugest me any book or tutorial web site.
View 2 Replies
View Related
Jul 21, 2010
How would I get this variable to allow whitespace?
var illegalChars = /W/; // allow letters, numbers, and underscores
View 11 Replies
View Related
Jul 23, 2005
I need to write a function that will remove a specified parameter from a
URL. For example:
removeParam("param1", "http://mysite.com/mypage.htm?param1=1¶m2=2");
would return:
"http://mysite.com/mypage.htm?param2=2"
I'm thinking that string.replace(/regex/, ""); would do the trick, but how
do I construct a correct regex?
I see a problem if the parameter name ("param1") happens to contain any
characters that have a special meaning in a regular expression.
View 3 Replies
View Related
Jul 23, 2005
I don't know where the actual issue is, but hopefully someone can explain.
The following displays "5" in FireFox, but "3" in IE:
<script type="text/javascript" language="javascript">
var newString = ",a,b,c,";
var treeArray = newString.split(/,/i);
alert(treeArray.length);
</script>
View 1 Replies
View Related
Sep 28, 2005
I have a string I have to parse
AB1.2CD34
I need to split the string into groups of letters and numbers..
"AB" "1.2" "CD" "34"
What is the best way of doing this ?
I've looked at string.split using a regex, but that doesn't output the
delimiters.
View 3 Replies
View Related
Dec 14, 2005
I am trying to write a regex that will parse BBcode into HTML using
JavaScript. Everything was going smoothly using the string class
replace() operator with regex's until I got to the list tag.
Implementing the list tag itself was fairly easy. What was not was
trying to handle the list items. For some reason, in BBcode, they
didn't bother defining an end tag for a list item. I guess that they
designed it with bad old HTML 3.2 in mind where you could make a list
by using:
<ul>
<li>item 1
<li>item2
</ul>
However, I need to make this XHTML compliant, so I needed to add the
</li> tag into the mix. Unfortunately, the only way to find where to
put it is to find the next[*] (<li>) tag or an open list (in the case
of nested lists) or close list tag. I was trying to get a rule that
handles the list items to work, but it only matches the first item in
any list. Here is the line of code:
bbcode =
bbcode.replace(/[list(=1|=a|)](.*?)[*](.*?)([*]|[list]|[/list])/g,
'[list$1]$2<li>$3</li>$4');
First, I check to make sure that the list item is inside a list. Then,
I match the[*] tag to find the start of the item, then I match either
the next[*],, orto determine the end of the item.
This successfully prevents a list item outside of a list from being
made into a <li> element, but only matches the first list item in a
list. Is there any way to make this match all occurances of this
pattern without looping over the statement until the pattern can no
longer be found?
View 13 Replies
View Related
Jan 3, 2007
I am trying to parse a HTML page and want to replace the input element The following code fails all the time.
var ex = "<input type="hidden" name="__VIEWSTATE"
id="__VIEWSTATE"
value="/wEPDwULLTE2NjEyNTI0MThkGAEFEHNlY3Rpb25zR3JpZFZpZXc PZ2QN271==
/>";
var regEx = new RegExp("<s*input[^>]*>(.*?)s*/");
if (ex.match( regEx))
{
alert('match');
}
else
{
alert ('no match');
}
View 1 Replies
View Related
May 18, 2007
I have been working on this for a few hours and am frustrated
beyond all extent. I have tried to research this on the web as well
with no success. I am trying to match certain contents within a
wrapper div. So for example if the inside of the wrapper div was the
following:
<div id="wrapper">
<a href="#">a great link that contain text and symbols</a>
<div... </div>
<div... </div>
</div>
I would like to strip out all the internal div's. But because there
can be alot of internal div's, I figured it would be less processor
intensive to just match the first 'a' tag and repopulate the wrapper
div with the match. I am trying to use something like the following
regex:
re = /^<a(.+)</a>/;
with the following statment:
$temp = document.getElementById('wrapper').innerHTML.match (re);
but this is returning the entire contents of the wrapper div. I have
tried variations of the regex and either continue to get the entire
contents or null returns. Any help would greatly be appreciated.
BTW, I can't match to the first because the contents may be touching (ie ...</a><div>...).
View 3 Replies
View Related