473,793 Members | 2,974 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Code clj FAQ automation

Hello,

I'm posting the software for one-FAQ-a-day as described on
http://tinyurl.com/qcxw7
(comp.lang.java script, July 18 2006, titled "CLJ newsgroup FAQ)
and on
http://tinyurl.com/ppt2s
(comp.lang.java script, July 22 2006, titled "Automation of
comp.lang.javas cript FAQ")

I did a test of all entries to the alt.comp.test newsgroup:
http://groups.google.com/group/alt.comp.test/

The first chapter of The FAQ ("meta-FAQ meta-questions") is excluded
from the daily messages.

The crontab is set to fire off one message a day at 16:00
Europe/Brussels Time.

I'm posting the code below for general review. Hope you all like it;
comments are welcome of course

--------------------------------------
--------------------------------------
BEGIN CODE
--------------------------------------
--------------------------------------
#!/usr/bin/perl

############### ############### ############### ############### ########
# comp.lang.javas cript FAQ - Daily sendout to Usenet based upon #
# XML feed in defined format. #
#------------------------------------------------------------------#
# Message headers can only contain 7bit ASCII chars (RFC2822). #
# I'm using ISO-8859-1 in the message bodies for maximum #
# compatibility with all kinds of newsreaders. #
#------------------------------------------------------------------#
# Code by Bart Van der Donck - www.dotinternet.be - Aug 2006. #
#------------------------------------------------------------------#
# This program is free software released under the GNU/GPL; you #
# can redistribute it and/or modify it under the GNU/GPL terms. #
############### ############### ############### ############### ########

############### ############### ############### ############### ########
# Load modules & locale for English formatted date. #
# These modules should be present in default Perl 5.6+ installs. #
############### ############### ############### ############### ########

use strict;
use warnings;
use POSIX;
use Net::NNTP;
use LWP::UserAgent;
use HTML::Parser;
use HTML::Entities;
use XML::Parser;
use XML::Simple;

setlocale(LC_AL L, 'English (UK)');
############### ############### ############### ############### ########
# Configuration area. #
############### ############### ############### ############### ########

# account on news server (leave both blanco if no authentication
# is needed)
my $account = 'j*********@dot internet.be';
my $password = 'secret';

# server and newsgroup
my $server = 'news.sunsite.d k';
my $ng = 'comp.lang.java script';

# sender data
my $sendername = 'FAQ server';
my $senderaddress = 'j*********@dot internet.be';

# where is the XML file to load ?
my $xml_file = 'http://www.jibbering.c om/faq/index.xml';

# footer of the message
my $footer = <<FOOT
--
Postings as these are automatically sent once a day. Their goal
is to answer repeated questions, and to offer the content to the
community for continuous evaluation/improvement. The complete
comp.lang.javas cript FAQ is at http://www.jibbering.com/faq/.
The FAQ workers are a group of volunteers.
FOOT
;

# where is writable file that keeps track of the counter
# (path must be absolute or relative to this script)
my $writablefile = 'entry2post.cnt ';
my $fc;

# misc. header settings, these should be left untouched
my $mime_version = '1.0';
my $charset = 'iso-8859-1';
my $content_type = 'text/plain';
my $trans_enc = '8bit';
my $organization = 'comp.lang.java script FAQ workers';
my $date = strftime "%a, %d %b %Y %H:%M:%S +0000", gmtime;

# which regexes for nice Usenet layout
my %regexes = (
"p" ="\n",
"/p" ="\n",
"em" ="_",
"/em" ="_",
"url" ="\n\n",
"/url" ="\n\n",
"ul" ="\n",
"/ul" ="\n",
"li" ="* ",
"/li" ="",
"moreinfo" ="\n\n",
"/moreinfo" ="\n\n",
"resource" ="\n\n",
"/resource" ="\n\n",
"icode" ="`` ",
"/icode" =" ''",
"code" ="\n\n",
"/code" ="\n\n",
);

# run options
my $sendout = 1; # 1 = send to Usenet, 0 = print to screen.
my $printnrs = 0; # 1 = include FAQ chapter & entry nr,
# 0 = exclude. CAUTION! as it takes this
# data not from XML feed but from this
# porgram's internal counting.

############### ############### ############### ############### ########
# Get XML file. #
############### ############### ############### ############### ########

# fetch XML file
my $ua = new LWP::UserAgent;
$ua->agent("AgentNa me/0.1 " . $ua->agent);
my $req = new HTTP::Request GET =$xml_file;
$req->content_type(' text/xml');
my $res = $ua->request($req );
unless ($res->is_success) {
die "Error: couldn't get $xml_file: $!\n";
}

# is XML file well-formed ?
my $xml = $res->content;
eval { XML::Parser->new(ErrorConte xt=>1)->parse($xml) };
if ($@) {
die "Error: $xml_file is not well-formed XML\n";
}

############### ############### ############### ############### ########
# Regexes on XML feed. #
############### ############### ############### ############### ########

# regex the mentionned tags to Usenet layout format
while ( my ($key, $val) = each %regexes ) {
$xml =~ s/<\Q$key\E(?:[^>'"]*|(['"]).*?\1)*>/$val/gsi;
}

# regex out all other tags except CONTENTS, CONTENT, FAQ, TITLE
my $result_xml = '';
my @report_tags = qw(content contents faq title);
HTML::Parser->new(api_versio n =3,
start_h =[\&tag, 'tokenpos, text'],
process_h =['', ''],
comment_h =['', ''],
declaration_h =[\&decl, 'tagname, text'],
default_h =[\&text, 'text'],
report_tags =\@report_tags,
)
->parse( $xml );

# check for well-formedness
eval { XML::Parser->new(ErrorConte xt=>1)->parse($result_ xml) };
if ($@) {
die "Error: XML file not well-formed after Usenet format regexes";
}

############### ############### ############### ############### ########
# Decide which subject/body part we need. #
############### ############### ############### ############### ########

# tie xml to vars
my $xml_ref = XMLin($result_x ml, ForceArray =1);

# load counter file
open my $F, '<', $writablefile
|| die "Error: can't open $writablefile: $!";
flock($F, 1) || die "Error: can't get LOCK_SH on $writablefile: $!";
$fc = $_ while <$F>;
close $F || die "Error: can't close $writablefile: $!";
my ($chap, $cnt) = split /\|/, $fc;

# lookup subject/body in hashed structure
unless ($xml_ref->{CONTENTS}->[0]
->{CONTENT}->[$chap]
->{CONTENT}->[$cnt]) {
save4next ( $chap, $cnt );
die "Error: FAQ entry ".($chap+1)."." .($cnt+1).". doesn't exist";
}

my $part = $xml_ref->{CONTENTS}->[0]
->{CONTENT}->[$chap]
->{CONTENT}->[$cnt];
my %hash_deref = %$part;
my $subject = $hash_deref{TIT LE};
my $body = $hash_deref{con tent};
############### ############### ############### ############### ########
# Regexes on $body and $subject and compile final $message #
############### ############### ############### ############### ########

# decode num/char HTML entities in subject and in message
$subject = HTML::Entities: :decode($subjec t);
$body = HTML::Entities: :decode($body);

# take care of Euro sign towards ISO-8859-1, just in case
s/€/Euro/ig for ($body, $subject);

# don't allow EOLs and successive blancs in subject lines
$subject =~ s/\n/ /g;
$subject =~ s/\s+/ /g;

# remove 1-6 initial blanks from begin + all from end
my @splitbody = split /\n/, $body;
for (@splitbody) {
s/\s+$//;
s/^\s{1,6}//;
s/^\s+http:/http:/g; # issue with leading http on line
}
$body = join "\n", @splitbody;

# remove more than three EOLs
$body =~ s/\n{3,}/\n\n/gs;

# remove all EOLs from begin and end of $body
for ($body) {
s/^\n+//;
s/\n+$//;
}

# should we add the FAQ entry chapter/number ? (own counting)
if ($printnrs==1) {
$subject = 'FAQ ' . $chap . '.' . $cnt . '. - ' . $subject ;
}
else {
$subject = 'FAQ - ' . $subject;
}

# format full body
$body = "\x2D" x 71 . "\n" . $subject . "\n" . "\x2D" x 71
. "\n" x 2 . $body . "\n" x 3 . $footer;

# remove lines that consist only of 1 dot
$body =~ s/\n\.\n/\n/g;

# compute & store which entry is to send next time
save4next ( $chap, $cnt );

# compile complete message
my $message = <<EOM;
Reply-To: "$sendernam e" <$senderaddress >
From: "$sendernam e" <$senderaddress >
Date: $date
Newsgroups: $ng
Subject: $subject
Organization: $organization
Mime-Version: $mime_version
Content-Type: $content_type; charset="$chars et"
Content-Transfer-Encoding: $trans_enc\n
$body
EOM

# should we send the message to Usenet or print to screen ?
if ($sendout != 1) {
print $message;
exit;
}
############### ############### ############### ############### ########
# Fire off the message. #
############### ############### ############### ############### ########

# do some final checks
if ( !$message || $message eq '' || !$body || $body eq ''
|| !$subject || $subject eq '') {
die "Error: didn't send message due to malformed data";
}

# send action (heavy error checking)
my $nntp = Net::NNTP->new( $server )
|| die "Error: can't connect to $server: $!\n";

$nntp->authinfo( $account, $password )
|| die "Error: Net::NNTP->authinfo() failed: $!\n"
if ( defined $account && defined $password
&& $account ne '' && $password ne '');

$nntp->postok() || die "Error: $server said: not allowed to post\n";

$nntp->post( $message )
|| die "Error: can't send message: $!\n";
$nntp->quit;
############### ############### ############### ############### ########
# HTML::Parser and $chap/$cnt counting routines. #
############### ############### ############### ############### ########

sub tag {
my ($pos, $text) = @_;
if (@$pos >= 4) {
my ($k_offset, $k_len, $v_offset, $v_len) = @{$pos}[-4 .. -1];
my $next_attr = $v_offset?$v_of fset+$v_len:$k_ offset+$k_len;
my $edited;
while (@$pos >= 4) {
($k_offset, $k_len, $v_offset, $v_len) = splice @$pos, -4;
$next_attr = $k_offset;
}
$text =~ s/^(<\w+)\s+>$/$1>/ if $edited;
}
$result_xml.=$t ext;
}

sub decl {
my $type = shift;
$result_xml.=sh ift if $type eq 'doctype';
}

sub text {
$result_xml.= shift;
}

sub save4next {
my ($ch, $cn) = @_;

# next entry in same chapter exists ?
if ($xml_ref->{CONTENTS}->[0]
->{CONTENT}->[$ch]
->{CONTENT}->[$cn+1]) {
writefile( $ch . '|' . ($cn+1) );
return
}

# first entry in next chapter exists ?
if ($xml_ref->{CONTENTS}->[0]
->{CONTENT}->[$ch+1]
->{CONTENT}->[0]) {
writefile( ($ch+1).'|0' );
return
}

# reset entries if we're at the last entry of the last chapter
if ($xml_ref->{CONTENTS}->[0]
->{CONTENT}->[1]
->{CONTENT}->[0]) {
writefile( '1|0' );
return
}

# last resort: no entry found =reset counter and die
writefile( '1|0' );
die "Error: couldn't find next entry for FAQ ".($ch+1).".".( $cn+1)
."; next time I'll take the first entry again";
}

sub writefile {
open WR, '>', $writablefile
|| die "Error: can't open $writablefile: $!";
print WR shift;
close WR || die "Error: can't close $writablefile: $!";
}

__END__
--------------------------------------
--------------------------------------
END CODE
--------------------------------------
--------------------------------------

--
Bart

Jul 31 '06 #1
2 1420
JRS: In article <11************ *********@m79g2 000cwm.googlegr oups.com>,
dated Mon, 31 Jul 2006 05:32:14 remote, seen in
news:comp.lang. javascript, Bart Van der Donck <ba**@nijlen.co mposted :
>The first chapter of The FAQ ("meta-FAQ meta-questions") is excluded
from the daily messages.

The crontab is set to fire off one message a day at 16:00
Europe/Brussels Time.

Convenient for your test, maybe; but, unless that's specifically chosen
to annoy the Merkins, ISTM that it would be far better to use
midnight UTC, or some arbitrarily-chosen minute of the first hour of the
UTC day if it is felt that too many other things happen at the exact
hour.

All intelligent Europeans know what Brussels time is; but it's not
reasonable to expect it to be known by the rest of the world.

--
© John Stockton, Surrey, UK. ?@merlyn.demon. co.uk Turnpike v4.00 MIME. ©
Web <URL:http://www.merlyn.demo n.co.uk/- w. FAQish topics, links, acronyms
PAS EXE etc : <URL:http://www.merlyn.demo n.co.uk/programs/- see 00index.htm
Dates - miscdate.htm moredate.htm js-dates.htm pas-time.htm critdate.htm etc.
Jul 31 '06 #2

Dr John Stockton wrote:
JRS: In article <11************ *********@m79g2 000cwm.googlegr oups.com>,
dated Mon, 31 Jul 2006 05:32:14 remote, seen in
news:comp.lang. javascript, Bart Van der Donck <ba**@nijlen.co mposted :
The crontab is set to fire off one message a day at 16:00
Europe/Brussels Time.

Convenient for your test, maybe; but, unless that's specifically chosen
to annoy the Merkins, ISTM that it would be far better to use
midnight UTC, or some arbitrarily-chosen minute of the first hour of the
UTC day if it is felt that too many other things happen at the exact
hour.
Unfortunately I can only schedule my crontab relative to the machine's
hour, not to anything else. I've set the cronjob to 01:00 AM CET
(Europe/Brussels time). This means that the message will now be sent at
midnight WET (Europe/London time, = UTC+0 in winter and UTC+1 in
summer).

--
Bart

Aug 1 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

15
5849
by: qwweeeit | last post by:
Hi all, Elliot Temple on the 1 June wrote: > How do I make Python press a button on a webpage? I looked at > urllib, but I only see how to open a URL with that. I searched > google but no luck. > For example, google has a button <input type=submit value="Google > Search" name=btnG> how would i make a script to press that button? I have a similar target: web automation, which
1
808
by: Jimmer | last post by:
I've got what should be an easy automation problem, but the solution simply isn't coming to me. I've got several public variables set up for automation as follows: Public gappExcel As Excel.Application 'ADO Object for Excel Automation Public gstrExcelDir As String 'Source or Destination Directory
4
2229
by: David LACASSAGNE | last post by:
Is it possible to set a password to protect the code of an Access VBA project by automation (I already know how to to it manually)? David.
1
2408
by: Lee Seung Hoo | last post by:
hi~ :) I need all information of "Automation" or "Automation Object" what is that ? why is it useful ? How can I use that by C# or .Net Framework ?
7
10238
by: Tim | last post by:
When there is a need to pass some dynamic information between 2 managed assemblies, the "Dictionary object" in Generic form can be used as a method parameter to pass the information. The information that needs to be passed can be stored as Key-Value pairs, and the method signature remains the same. That way, handling future requirements of passing additional details to the callee can be handled without changing the method signature. Is...
0
9318
by: Sharath | last post by:
Quality Globe is Glad to Offer you the Fast Track course on Automation, QTP Basics and Advanced, and Quality Center Starting Date: June 4th, 2007 Timings: 10 AM to 3:30 PM Duration: 50 Hours Location: BTM Layout 1st Stage, Bangalore
0
2386
by: Sharath | last post by:
"Inspired" by the huge success of our first two automation fast track batches We are forced to start third fast track automation batch ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++ Course on Automation, QTP Basics and Advanced, Quality Center and project ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
0
2198
by: Sharath | last post by:
We are glad to inform you that "Inspired" by the huge success of our first three automation fast track batches We are forced to start fourth fast track automation batch ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Course on Automation, QTP Basics and Advanced, Quality Center and project
0
2076
by: Sharath | last post by:
We are glad to inform you that "Inspired" by the huge success of our first four automation fast track batches We are forced to start fifth fast track automation batch ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Course on Automation, QTP Basics and Advanced, Quality Center and project
0
9670
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9518
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10211
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10000
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6776
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5436
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5560
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4111
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3719
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.