[solved] Parsing strings with regular expressions - Match word with character #

Talk about writing modules and plugins for CMS Made Simple, or about specific core functionality. This board is for PHP programmers that are contributing to CMSMS not for site developers
Post Reply
nhaack

[solved] Parsing strings with regular expressions - Match word with character #

Post by nhaack »

Hi there,

I am currently working on some enhancements for the twitter plug-in. I want to enable it to make urls, usernames and hashtags links using regular expressions. My solutions works fine for the URLs, however, concerning usernames and hashtags I somehow have the same problem. I am already half-way through.. I just can't put my finger on this one.

I wrote a little function for that:

Code: Select all


function twitter_message_enhancer($message){
	$twitter_message_enhancer = eregi_replace("([[:alnum:]]+)://([^[:space:]]*)([[:alnum:]#?/&=])","<a href=\"\\1://\\2\\3\" target=\"_blank\">\\1://\\2\\3</a>", $message); 
	$twitter_message_enhancer = eregi_replace('([@]([[:alnum:]]+))', '<a href="http://twitter.com/\2" target="blank">\1</a>', $twitter_message_enhancer); 
	$twitter_message_enhancer = eregi_replace('([#]([[:alnum:]]+))', '<a href="http://search.twitter.com/search?q=%23\2" target="_blank">\1</a>', $twitter_message_enhancer); 
 	return $twitter_message_enhancer;
}

It works already in most cases. But my problem arises from not knowing how to match only if the checked string element begins with an "#" or "@".

Code: Select all


([#]([[:alnum:]]+)) 

just matches any string-part beginning with the character "#" (e.g. "word1 word2 #word3" replaces #word3 but in "wor#d1 word2 #word3" #d1 gets replaced as well. I though of using a space in front as a matching criterion, but then... what if a message begins with a "#" and there is no leading white space?

I tried using ^ to indicate that the string should start with "#" as follows:

Code: Select all


(^[#]([[:alnum:]]+)) or 
([^#]([[:alnum:]]+)) 

but it doesn't work. I think that this is the wrong approach as the string would most likely, as sort of a sub-element always begin with a "#". Is there a way to indicate the beginning of a word in the parsing process?

I am clueless about this at the moment  ???

As said, it works fine for most cases, but in rare situations, twitter users post urls with the "#" inside, in this case, the replace rules would first match the URL, the second rule would replace the stuff after "#" with another url. Thus, you have two interwoven links which obviously doesn't work. (the same applies for @username text-elements and posted e-mail addresses).

As a Plan B, I could go for an explode at spaces for these replacements... walk through the array of words and replace if matched - afterwards implode again with space as glue, but it is not that elegant ;)

Best
Nils
Last edited by nhaack on Mon Dec 29, 2008 6:16 pm, edited 1 time in total.
JeremyBASS

Re: Parsing strings with regular expressions - Match word beginning with character #

Post by JeremyBASS »

just a thought based off "wor#d1 word2 #word3"

Code: Select all


/^[#][:alnum:]+$/

so

Code: Select all


(^[#]([[:alnum:]]+$))


??

I didn't get to test it..

cheers
Last edited by JeremyBASS on Mon Dec 29, 2008 11:03 am, edited 1 time in total.
nhaack

Re: Parsing strings with regular expressions - Match word beginning with character #

Post by nhaack »

Hi Jeremy,

thanks for the hint. i tried it with the $ at the end. But then it seemed to be too strict. But it pointed me to the right direction ;)

At the end, I'll do two eregi_replace for the hash tags and "@" in front of usernames.

First

Code: Select all


( [#]([[:alnum:]]+))

to match if there is a match right at the beginning of the string.

Second

Code: Select all


(^[#]([[:alnum:]]+)) 

to match hash tags or user names within the text that begin as a "new" word (when they have a space in front).

I found a great site for evaluating regular expressions (it is in German though): http://regexp-evaluator.de/

Thanks and best
Nils
Last edited by nhaack on Mon Dec 29, 2008 5:06 pm, edited 1 time in total.
Post Reply

Return to “Developers Discussion”