473,320 Members | 1,993 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

SDN List & XML

http://www.treas.gov/offices/enforcement/ofac/sdn/

I wonder if it my be conceivable/practical to import on a regular
basis, hands-off, the XML version of this list directly from it's
source, the web, to a MS SQL database, in this case version 2K?

Thoughts, ideas?

Oct 2 '06 #1
10 4421

Zamdrist wrote:
I wonder if it my be conceivable/practical to import on a regular
basis, hands-off, the XML version of this list directly from it's
source, the web, to a MS SQL database, in this case version 2K?
Of course.

(The rest is such a basic and uninteresting question that I leave its
actual implementation for the reader. Or else hire a suitable
contractor. If you've no budget and you're doing it from scratch, then
read up on RSS aggregators as useful example code.)

Oct 2 '06 #2
Andy Dingley wrote:
Zamdrist wrote:
I wonder if it my be conceivable/practical to import on a regular
basis, hands-off, the XML version of this list directly from it's
source, the web, to a MS SQL database, in this case version 2K?

Of course.

(The rest is such a basic and uninteresting question that I leave its
actual implementation for the reader. Or else hire a suitable
contractor. If you've no budget and you're doing it from scratch, then
read up on RSS aggregators as useful example code.)
Thank you Andy. No, not asking anyone to do it for me, just looking for
a starting point.

Oct 2 '06 #3
Zamdrist wrote:
http://www.treas.gov/offices/enforcement/ofac/sdn/

I wonder if it my be conceivable/practical to import on a regular
basis, hands-off, the XML version of this list directly from it's
source, the web, to a MS SQL database, in this case version 2K?

Thoughts, ideas?
I see two different issues here: getting the XML, and importing it in
SQL2k.

First part is easy. The easier way I can figure out is to use a PHP
script (not a web page), from which you can download the XML. Then you
can call this script from a batch file, and call the batch file from a
CmdExec task in a DTS.

Here is a snippet that will get your file from the website. Save this
as yourfile.php.
(You need to have PHP installed, but you don't need to configure it in
a web server.)

# File begins here
<?
$handle = fopen("c:/SDN/sdn.xml", 'w');
fwrite($handle,
file_get_contents('http://www.treas.gov/offices/enforcement/ofac/sdn/sdn.xml'));
?>
# File ends here

To call the PHP script from a batch file:
php.exe yourfile.php

Make sure your PHP folder is in the system path.

The second issue is more tricky. With SQL2k there is no XML datatype,
so either you store the whole XML file in a varchar or text field, or
you use MXSML in a VBS to parse the content and you import in whatever
tables you have designed before.

Regards,
lucm

Oct 2 '06 #4

Zamdrist wrote:
Thank you Andy. No, not asking anyone to do it for me, just looking for
a starting point.
I hate to sound so negative here, because you've posted a very fair
question, but quite honestly this should either "just be obvious to
you", or else you should go talk to someone to whom it is already
obvious.

Oct 3 '06 #5
Andy Dingley wrote:
Zamdrist wrote:
Thank you Andy. No, not asking anyone to do it for me, just looking for
a starting point.

I hate to sound so negative here, because you've posted a very fair
question, but quite honestly this should either "just be obvious to
you", or else you should go talk to someone to whom it is already
obvious.
I appreciate your patience with my ignorance. I work extensively with
T-SQL, VB/VBA/VBScript...I'm not a complete moron, no you didn't say I
was.

Are you trying to say, it's as easy as say: File/Import?

I mean, there has to be a table definition somewhere first off right?

Oct 3 '06 #6
Andy Dingley wrote:
Zamdrist wrote:
Thank you Andy. No, not asking anyone to do it for me, just looking for
a starting point.

I hate to sound so negative here, because you've posted a very fair
question, but quite honestly this should either "just be obvious to
you", or else you should go talk to someone to whom it is already
obvious.
I'm looking into SQLXMLBulkLoad Com object right now, looks like what
I'll need to use.

Oct 3 '06 #7
Ok, I have some code and error for you :)

Some VBScript Code:

Dim objBulkLoad
Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad")
objBulkLoad.ConnectionString = "provider=SQLOLEDB;data
source=server\server;database=SDNList;Integrated Security=SSPI;"
objBulkLoad.SchemaGen = True
objBulkLoad.BulkLoad = False
objBulkLoad.Execute
"http://www.treas.gov/offices/enforcement/ofac/sdn/sdn.xsd",
"http://www.treas.gov/offices/enforcement/ofac/sdn/sdn.xml"
Set objBulkLoad = Nothing

Of course, that'll wrap terribly. But anyhow, the error: Schema:
relationship expected on 'publshInformation'

My goal at this point is only to import the schema, and once I have
that working, import the data.

Thoughts? Thanks.

Oct 3 '06 #8

Zamdrist wrote:
I'm looking into SQLXMLBulkLoad Com object right now, looks like what
I'll need to use.
Looks like a reasonable place to start. Of course it might not give
you the table structure you want, but consider loading into an ugly
scratch table, then fixing that through the SQL you're familiar with.

Some notes:

1. It's the web. The web is slow, broken and wrong, usually at least
one of these at any time. Get your content down into a local temp file
before trying to throw it at a bulk loader COM object. Even if it
_claims_ to load from any URL, it probably won't do so reliably. The
XMLHTTP COM object will be useful here.

2. Encoding and entiities. You shouldn't need to know about these,
until one day it all breaks.
2.1 Use UTF-8 for everything you make, and allow as many tools as
possible to sniff for encodings on incoming content - they're usually
good at it. Don't specify encodings explicitly unless you have -
they're a contract to get it right in the future too, and if you can't
guarantee this to work you're just writing tomorrow's bugs for
yourself.

2.2 You shouldn't need entities and shouldn't use them. If things
break because your supplier used them (depressingly common in the RSS
world) it's because they've assumed the pre-declaration of the HTML
entity set (typically accented foreign language characters).

3. I wouldn't use VBScript for _anything_, JavaScript / JScript beats
it every time (just for the error handling, if nothing else)

4. You need an XMLSchema to make things work with SQLXMLBulkLoad This
will affect both data typing and the inferred table layout.

5. As with any ETL, pull the thing in first to an unconstrained table,
then worry about setting up constraints, indexes and foreign keys
afterwards. Drop them if you have to before the import, then re-create
-- far too slow to do it row-by-row.

6. Assume it will break. Don't throw away the old data until you've
loaded the new data. This includes caching the incoming XML document,
validating it against the schema, checking it's not a simple (but
valid) error message saying "Service off-line today", loading it into
the DB (any old table), then loading it into the DB (the right tables,
structure, and fully indexed and keyed). _Any_ of these steps can
fail, and in a large decoupled near-real-time system like this, they
will do so from time to time.

7. Time your downloads. Time the intervals between them. Watch out for
whether the data is actually changing. Use some variant on an
exponential back-off algorithm so that you catch the changes promptly,
but don't hammer on their servers every few minutes (try measuring the
actual change rate and re-loading at twice this frequency) Maybe they
publish metadata suggest how often to try it, or they offer a
lightweight notification service to tell you when to grab the big one.

8. Log everything. Log when you tried, when it worked, when it failed,
and how long it took. If it takes twice as long to download at 3am,
then don't try and grab it at 3am! They're probably running some
cron'ed batch job at that time of night and you're hitting it. Making
reliable systems means caring about this level of trivia.

9. Be polite. Hammering a feed service's servers too much will tend to
get your IP blocked.

Oct 4 '06 #9
Andy Dingley wrote:
Zamdrist wrote:
I'm looking into SQLXMLBulkLoad Com object right now, looks like what
I'll need to use.

Looks like a reasonable place to start. Of course it might not give
you the table structure you want, but consider loading into an ugly
scratch table, then fixing that through the SQL you're familiar with.

Some notes:
Thank you Andy, looks like some good advice.

All I really wanna do is grab the data and slam it into SQL Server,
work with the data from there, I don't want to do anything fancy with
the XML. Doesn't look like SQLXMLBulkLoad will work for me as the feed
as an inline schema. I'll have to find another way.

Your point about javascript is well taken, and I've just been lazy
using vbscript.

Incidently MS Access 2003 sucks in this feed in a blink of an eye,
structure and data or just structure. Interesting, but ultimately not
useful as I would want to do directly to MS SQL Server.

Thank you for your help, advice & patience.

Oct 4 '06 #10
Here is what I've learned so far, and a review of what I wish to
accomplish.

First off what I want to accomplish: Import the SDN (Special Designated
Names) List from:
http://www.treas.gov/offices/enforce...ac/sdn/sdn.xml to SQL
Server. Preferrably 'as is' without having to modify the file at all.

The U.S. Treasury makes the xml file and an xsd file avialble also
here: http://www.treas.gov/offices/enforce...dn/index.shtml

It appears as though the xml file contains an in-line schema. Quite
obviously the xsd also provides a scheme definition. Because
SQLXMLBulkLoad *does not* support in-line schemas, I cannot use it that
functionality. I have searched exhausitively online for some clue as
how I could import this data into SQL Server as is.

Could someone *please* take a brief look at the xml & xsd, and let me
know if it's even possible to 'bulk' load this data as is, or if it has
to be copied locally first? Or what the recommendation is? I can write
DTS packages, stored procedures, vb or javascripts, including some .Net
also.

Additionally I have search Usenet, including this group, others too
have struggled with this, and this data in particular. Unfortunately
there do not appear to be any clear answers.

Thank you!

Oct 5 '06 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
by: Yomanium Yoth Taripoät II | last post by:
HI, 1) what are the differences between list and tuple? 2) how to concatenate tuple and list? no method, no opérator? 3) im looking the fucking manual, and cant add value in my tuple, when it...
2
by: Matthew Wieder | last post by:
In my previous post, I asked about a routine which prepares a string for an XPath query by taking care of escape characters. Unable to find a list, I'm now wondering assumign I enclose the...
3
by: John Sutor | last post by:
Does anyone know how to populate a listbox from an arraylist with delimited values. Either that or with just two values. *** Sent via Developersdex http://www.developersdex.com *** Don't just...
5
by: Tim | last post by:
Hi, I am trying to load up a drop list with semi aligned columns like so: Item 2 Something Else etc. To do so, I have tried to fill the Text valus of ListItems like so:
14
by: Arne | last post by:
A lot of Firefox users I know, says they have problems with validation where the ampersand sign has to be written as &amp; to be valid. I don't have Firefox my self and don't wont to install it only...
0
by: weiwei | last post by:
Hi here is my scenario, I create a drop down list in itemtemplate.(that drop down is created from db), after user click edit command, my ideal plan is have another drop down list in...
56
by: Zytan | last post by:
Obviously you can't just use a simple for loop, since you may skip over elements. You could modify the loop counter each time an element is deleted. But, the loop ending condition must be...
2
by: smartic | last post by:
i'm having three lists when i select from any one the others be visible by hierarchy but it takes to long to write my code is there is another away to write this code like XML that is my javascript...
7
by: TechnoAtif | last post by:
Hi..There is again a similar problem..Now i;ve got the option values of list?menu box to show the textbox. If the option value of list is yes then the textbox is to be displayed . Can anyone...
5
by: googletired | last post by:
On my current site, I can click on a category and the view the content of that category. This is fine, but now I want to be able to view the contents of *ALL* the categories in one continous scroll...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.