##### strip_ad.rc # # Resource file for procmail. # Filters ads from emails and replaces them with a string. # # Variables, in (set before calling): # AD_START Regular expression matching the first line of the ad # (remember that awk needs 2 "\" to quote a character!) # AD_END Regular expression matching the last line of the ad # AD_MATCH Regular expression which must match inside the ad # AD_NOMATCH Regular expression which must not match inside the ad # (probably want to set this at least to an empty line) # If empty this test will be skipped. # AD_NOTE Text with which every found ad is replaced # (set empty, or any string, use \n instead of literal) # AWK The awk to use, if set. If unset, a search for a # suitable awk is made, and stored in AWK. # Variables, returned: # AWK The name of the awk program used # # Run with e.g.: # AD_START="^=======*$" # AD_END="$AD_START" # AD_MATCH='http://www\\.qksrv\\.net/click-[0-9-]*' # AD_NOMATCH='^$' # AD_NOTE='[ad removed]\n' # INCLUDERC=yourpath/strip_ad.rc # in your $HOME/.procmailrc # # The latest version is always available from: # http://volker.dnsalias.net/soft/procmail/ # # Copyright (C) by Volker Kuhlmann # Released under the terms of the GNU General Public License (GPL) Version 2. # See http://www.gnu.org/ for details. # # Volker Kuhlmann # 14, 18 Apr 2002 # 19 Jul; 1 Oct 2003 # 14 Sep 2004 # ## Detect which version of awk to use, in order: gawk, mawk, nawk, awk ## If AWK is set, the given awk program is used. ## If AWK is unset, a search will be made and stored in AWK, so that it is set ## already for the next time. :0 i * AWK ?? ^^^^ AWK=| for awk in gawk mawk nawk awk; do \ ($awk /dev/null >&2 \ && { echo $awk; break; } done # ## Or set variable here and comment out the 5 lines above, or probably even ## better, set AWK= at the start of your ~/.procmailrc . #AWK=gawk ## Filter ad texts :0 * ! AWK ?? ^^^^ { # make sure the regex AD_NOMATCH__ is never empty (if AD_NOMATCH is empty, # AD_NOMATCH__ is never tested) AD_NOMATCH__=$AD_NOMATCH :0 * AD_NOMATCH__ ?? ^^^^ { AD_NOMATCH__=DumbSolaris2.7AWK } :0 fbw | $AWK '\ BEGIN { ad=0; text=""; IGNORECASE=1 }\ ad && "'"$AD_NOMATCH"'" != "" && $0 ~ "'"$AD_NOMATCH__"'" { \ ad=0; print text $0; text=""; next }\ ad && $0 ~ "'"$AD_END"'" {\ ad=0; \ if (match(text,"'"$AD_MATCH"'")) {\ printf "%s", "'"$AD_NOTE"'";\ text=""; next;\ } else {\ printf "%s", text;\ text="";\ }\ }\ $0 ~ "'"$AD_START"'" { ad=1 }\ ad { text=text $0 "\n" }\ !ad { print }\ ' AD_NOMATCH__= } # The usual headache with bare-bones Unix-rubbish: the sed solution never(!) # works under Solaris 2.7 because -e handles neither newlines nor nested # '{ }'-lists. Solaris awk is also too dumb - nawk is required. # Needless to say, the GNU tools never have a problem. Long live the Penguin! # With awk (change to nawk, gawk etc. if necessary, e.g. solaris awk is bad): # 10Apr02: Arrrgh, with nawk/gawk one has to use "\\." to get a literal "."! # The replacement string can now be empty, i.e. note=''. # Had a go at implementing this in sed, but didn't manage. Giving up. # Sed is very dangerous in procmail anyway with its expression delimiting, # see comments in procmail_vacation.rc -VK 14Apr02 # Solaris 2.7 nawk is too dumb to ignore an empty regex. This gives error: # ad && "'"$AD_NOMATCH"'" != "" && $0 ~ "'"$AD_NOMATCH"'" { # if AD_NOMATCH is the empty string. # It's impossible to test for within nawk, need to change the nawk program # or make sure that part is never empty. -VK 1Oct03 # See strip_egroups_ad_old.rc for the sed implementation which never quite # worked. ##### EOF strip_ad.rc