Wed
2
Jul '08
Filter unicode ranges in Yahoo Pipes
by Frank Spychalski filed under all

I stopped working on my RSS filter because I have to admit that Yahoo Pipes is damn good, way better than anything I could do on my own.

One of my filters removed remove links with non-ascii titles from delicious/popular (no offense, but I cannot read Japanese/Chinese/Russian). This is a broad generalization, but it worked pretty good. I had to search a little bit until I found some good documentation on unicode and RegEx which works with Pipes:

Perl and PCRE do not support the uFFFF syntax. They use x{FFFF} instead. You can omit leading zeros in the hexadecimal number between the curly braces. Since x by itself is not a valid regex token, x{1234} can never be confused to match x 1234 times. It always matches the Unicode code point U+1234. x{1234}{5678} will try to match code point U+1234 exactly 5678 times.


Any comments? Or questions? Just leave a Reply: