I need to cut up paragraphs into their constituent sentences. I also want to calculate with abreviations and decimals containing dots in the middle of the sentences.
My simplistic definition: A sentence starts with a capital letter, finishes with (.?!) followed by a space and the capital letter of the next sentence, plus something that is NOT (.?!) I came up with the following negative look- ahead solution that works on decimals, but fails on abreviations.
[A-Z]((?![.?!]s+[A-Z][^.?!]).)+
The negative look-ahead should fail - and come back with true- till it arrives at the sentence ending position in my definition, but it doesn't..
Example: Just after daybreak in Nags Head on the Outer Banks, about 200 miles northeast of Jacksonville, winds 85.43 miles / hour whipped heavy rain across the resort town. Tall waves covered what had been the beach, and the surf pushed as high as the backs of some of the N.Y. dt. houses and hotels fronting the strand. Lights flickered in one hotel, but the power was still on.
I have found the solution to select ANY whole English sentence reliably regardless of quotation marks, or even punctuation marks used inside them for abreviations, decimals or whatever other purposes! Tests reliably on any non-accented string!
Find a non-accented capital letter that might be preceeded by a quotation mark and check that it is not directly followed by any punctuation marks to exclude capital letter abbreviations inside sentences. Then crawl forward by repeating a group consisting of a negative look-ahead and the universal selector character until you arrive at the end of the sentence you are in. You will know you are there if you find the sequence of a possible quotation mark - the one closing its pair at the start of your sentence, followed by the sentence- closing punctuation mark and the white space that neccessarily separates your sentence from the next one. Then you repeat the criteria for the start of a sence to see it's already a new one! Because of the negative condition in the look-ahead the repeated group - the universal selector really - did not choose the closing punctuation mark + the possible quotation mark, so you should care for these separately.
SUGGESTION FOR FURTHER DEVELOPMENT: Together with the starting non-accented capital letters you can also use hexadecimal notations to describe accented ANSI capital letters to select sentences in any other European languages. But this is not an issue for me at the present..
I need to parse an an HTML page and pull what ever values are in these JavaScript tags. There will usually be multiple tags with different values between the single quotes. The value in the next example I need to pull into my array would be 'A728'. Here is an example code..
I have a function which validates the password if there is a number: ------------------------------------------------- function findNumeric(str_obj){ regEx = /d/; if (str_obj.match(regEx)) return true; else return false; } -------------------------------------------------- The problem arises when I put a password with a space in between e.g: 'test test1'. The fucntion returns false. I've tried 's' in the regEx but the user can put the space anywhere..
Any idea how to solve this problem as I should be able to put any alplanumeric value into the password, including space.
I have a variable named "acct". I first want to remove any "-" characters from it's value. After this I want to verify that we have only exactly 12 digits in the variable.
Unfortunately I'm pretty green as far as using RegEx.
/d{12}/.test(acct); should do the second part, but how do I do the first?
which checks I have at least one lowercase letter, one uppercase letter and one number and the string is between 8 and 16 characters.I have adapted this from another source and it works as intended on all browsers but not IE7 or IE6 (oh microsoft why do you make my life so hard)This works fine in all other browsers (IE8 is fine) but doesnt work in IE6 or IE7
I'm writing an ECMAScript tokeniser and parser and trying to find out if I can eliminate the switching from tokenising "/" as start of regex or the division operator depending on the parser feedback - essentially, if I can make the tokeniser independent of the parser. (I have a gut feeling this needs too much special casing to be worth it). Code:
I have a bunch of text that I want to split into an array of sentences. I have the following code that works just fine on FF and Chromium, but ofc has to fail on the pile of *** that is IE [code]...
It does not produce any errors, but the resulting array often has empty strings as value instead of the sentences that should be there. how to do this in a way it also works on IE?
this needs to be able to match a string and make the following replacements: if the string matches without < or >, replace the match with a space, a replacement string, and another space. if < matches also, do not add the left space. if > matches, do not add the right space. if < and > match, do not add the beginning or ending space
Old {} String => Old Replacement String Old {<} String => OldReplacement String Old {>} String => Old ReplacementString Old {<>} String => OldReplacementString
this will have to be done a LOT of times, so efficiency is very important the answer in php is below. can anyone help me figure out how to do it in javascript? PHP Code:
I am trying to write a regex that will parse BBcode into HTML using JavaScript. Everything was going smoothly using the string class replace() operator with regex's until I got to the list tag. Implementing the list tag itself was fairly easy. What was not was trying to handle the list items. For some reason, in BBcode, they didn't bother defining an end tag for a list item. I guess that they designed it with bad old HTML 3.2 in mind where you could make a list by using:
<ul> <li>item 1 <li>item2 </ul>
However, I need to make this XHTML compliant, so I needed to add the </li> tag into the mix. Unfortunately, the only way to find where to put it is to find the next[*] (<li>) tag or an open list (in the case of nested lists) or close list tag. I was trying to get a rule that handles the list items to work, but it only matches the first item in any list. Here is the line of code:
First, I check to make sure that the list item is inside a list. Then, I match the[*] tag to find the start of the item, then I match either the next[*],, orto determine the end of the item. This successfully prevents a list item outside of a list from being made into a <li> element, but only matches the first list item in a list. Is there any way to make this match all occurances of this pattern without looping over the statement until the pattern can no longer be found?
I have been working on this for a few hours and am frustrated beyond all extent. I have tried to research this on the web as well with no success. I am trying to match certain contents within a wrapper div. So for example if the inside of the wrapper div was the following:
<div id="wrapper"> <a href="#">a great link that contain text and symbols</a> <div... </div> <div... </div> </div>
I would like to strip out all the internal div's. But because there can be alot of internal div's, I figured it would be less processor intensive to just match the first 'a' tag and repopulate the wrapper div with the match. I am trying to use something like the following regex:
but this is returning the entire contents of the wrapper div. I have tried variations of the regex and either continue to get the entire contents or null returns. Any help would greatly be appreciated. BTW, I can't match to the first because the contents may be touching (ie ...</a><div>...).