[solved] Parsing strings with regular expressions - Match word with character #
Posted: Mon Dec 29, 2008 3:06 am
Hi there,
I am currently working on some enhancements for the twitter plug-in. I want to enable it to make urls, usernames and hashtags links using regular expressions. My solutions works fine for the URLs, however, concerning usernames and hashtags I somehow have the same problem. I am already half-way through.. I just can't put my finger on this one.
I wrote a little function for that:
It works already in most cases. But my problem arises from not knowing how to match only if the checked string element begins with an "#" or "@".
just matches any string-part beginning with the character "#" (e.g. "word1 word2 #word3" replaces #word3 but in "wor#d1 word2 #word3" #d1 gets replaced as well. I though of using a space in front as a matching criterion, but then... what if a message begins with a "#" and there is no leading white space?
I tried using ^ to indicate that the string should start with "#" as follows:
but it doesn't work. I think that this is the wrong approach as the string would most likely, as sort of a sub-element always begin with a "#". Is there a way to indicate the beginning of a word in the parsing process?
I am clueless about this at the momentÂ
As said, it works fine for most cases, but in rare situations, twitter users post urls with the "#" inside, in this case, the replace rules would first match the URL, the second rule would replace the stuff after "#" with another url. Thus, you have two interwoven links which obviously doesn't work. (the same applies for @username text-elements and posted e-mail addresses).
As a Plan B, I could go for an explode at spaces for these replacements... walk through the array of words and replace if matched - afterwards implode again with space as glue, but it is not that elegant
Best
Nils
I am currently working on some enhancements for the twitter plug-in. I want to enable it to make urls, usernames and hashtags links using regular expressions. My solutions works fine for the URLs, however, concerning usernames and hashtags I somehow have the same problem. I am already half-way through.. I just can't put my finger on this one.
I wrote a little function for that:
Code: Select all
function twitter_message_enhancer($message){
$twitter_message_enhancer = eregi_replace("([[:alnum:]]+)://([^[:space:]]*)([[:alnum:]#?/&=])","<a href=\"\\1://\\2\\3\" target=\"_blank\">\\1://\\2\\3</a>", $message);
$twitter_message_enhancer = eregi_replace('([@]([[:alnum:]]+))', '<a href="http://twitter.com/\2" target="blank">\1</a>', $twitter_message_enhancer);
$twitter_message_enhancer = eregi_replace('([#]([[:alnum:]]+))', '<a href="http://search.twitter.com/search?q=%23\2" target="_blank">\1</a>', $twitter_message_enhancer);
return $twitter_message_enhancer;
}
Code: Select all
([#]([[:alnum:]]+))
I tried using ^ to indicate that the string should start with "#" as follows:
Code: Select all
(^[#]([[:alnum:]]+)) or
([^#]([[:alnum:]]+))
I am clueless about this at the momentÂ

As said, it works fine for most cases, but in rare situations, twitter users post urls with the "#" inside, in this case, the replace rules would first match the URL, the second rule would replace the stuff after "#" with another url. Thus, you have two interwoven links which obviously doesn't work. (the same applies for @username text-elements and posted e-mail addresses).
As a Plan B, I could go for an explode at spaces for these replacements... walk through the array of words and replace if matched - afterwards implode again with space as glue, but it is not that elegant

Best
Nils