467,926 Members | 2,026 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 467,926 developers. It's quick & easy.

Data Feed architecture

Hi, we have some datafeeds which pull info from external sources.
Unfortunately, we have to use screen scraping as there are no XML
feeds. The data feeds are located in a variety of different
applications located on different servers. I have to design a new
architecture, I have a fair idea of how I would do it but if anyone has
any pointers to a good existing architecure design or *things not to
do*, please post.

TIA
Markus
===================
googlenews2006markusj

Oct 1 '06 #1
  • viewed: 2125
Share:
7 Replies

The Microsoft Patterns & Practices website has some good guidelines for
architecture design:

http://msdn.microsoft.com/practices/
Ma*******@gmail.com wrote:
Hi, we have some datafeeds which pull info from external sources.
Unfortunately, we have to use screen scraping as there are no XML
feeds. The data feeds are located in a variety of different
applications located on different servers. I have to design a new
architecture, I have a fair idea of how I would do it but if anyone has
any pointers to a good existing architecure design or *things not to
do*, please post.

TIA
Markus
===================
googlenews2006markusj
Oct 1 '06 #2
adapters, agents, and messageware.

I've done this a couple of times so far. I'll need to know more about the
technologies you are working with to help more, though.

What your environment look like? Do you have Biztalk or an ESB running yet?
What time requirements do you have for the data?

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
<Ma*******@gmail.comwrote in message
news:11**********************@m73g2000cwd.googlegr oups.com...
Hi, we have some datafeeds which pull info from external sources.
Unfortunately, we have to use screen scraping as there are no XML
feeds. The data feeds are located in a variety of different
applications located on different servers. I have to design a new
architecture, I have a fair idea of how I would do it but if anyone has
any pointers to a good existing architecure design or *things not to
do*, please post.

TIA
Markus
===================
googlenews2006markusj

Oct 1 '06 #3
Hi Nick, we do not have BizTalk and I'm not too sure what you mean by
ESB sorry.

Basically we have a number of distributed applications on a variety of
platforms (Classic ASP, .NET 1.1/2.0 and a Python Script). These
applications are scheduled via a scheduling program to go away and
"screen scrape" information at a specified time.

All information is then logged into a centralized database so the data
can be used at a later date.

Database wise we are using MSSQL 2005

TIA
Markus
Nick Malik [Microsoft] wrote:
adapters, agents, and messageware.

I've done this a couple of times so far. I'll need to know more about the
technologies you are working with to help more, though.

What your environment look like? Do you have Biztalk or an ESB running yet?
What time requirements do you have for the data?

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
<Ma*******@gmail.comwrote in message
news:11**********************@m73g2000cwd.googlegr oups.com...
Hi, we have some datafeeds which pull info from external sources.
Unfortunately, we have to use screen scraping as there are no XML
feeds. The data feeds are located in a variety of different
applications located on different servers. I have to design a new
architecture, I have a fair idea of how I would do it but if anyone has
any pointers to a good existing architecure design or *things not to
do*, please post.

TIA
Markus
===================
googlenews2006markusj
Oct 1 '06 #4
Hi Markus,

From an architectural perspective, you have applications that draw data
using screen scraping. They interpret that data and store it in a database.
Part of what I need to know: how up to date does the data need to be?

Example:
Contoso Marine Supply is a catalog provider of small parts and fittings for
boaters. They have a Mainframe application, written in CICS, that is used
to enter catalog orders that arrive via a mail processing center.

At any time, the company employees can see the list of invoices that need to
be sent to the customer via a CICS screen on an IBM 3270 terminal.

If the system that prints and sends the invoices is on the Windows platform,
then it makes sense that the data is pulled periodically (perhaps nightly?)
and if a new invoice is found, then the necessary data is stored for
printing. We could also say that we print invoices twice a week.

In this scenario, the data needs to get to the Windows application twice a
week. We pull the data more often, which adds a level of *reliability*
(because if the mainframe or the windows server app are not running on
Tuesday at midnight, you can still pull the data on Wednesday for Thursday's
print run... this serves the reliable delivery of data).

A different scenario may be if the Windows server application is a Partner
Relationship Management system. In that case, the PRM system needs to know
about the orders as soon as they are entered, because a salesman may be
about to call on a particular supplier, and they need accurate and
up-to-date information about the orders that are coming through for their
parts. In this case, the time requirements would be pretty much 'as soon as
humanly possible' (I like the term "near real time").

So I'm asking about the time requirements. You've got some of the
picture... you have apps that pull data. Cool. What data do they pull and
why do they pull it? That's pretty important info if I'm going to be
helpful.

ESB = Enterprise Services Bus.

Please tell me what type of app you are screen scraping (CICS, UNIX, AS/400,
what?).
--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
<Ma*******@gmail.comwrote in message
news:11*********************@c28g2000cwb.googlegro ups.com...
Hi Nick, we do not have BizTalk and I'm not too sure what you mean by
ESB sorry.

Basically we have a number of distributed applications on a variety of
platforms (Classic ASP, .NET 1.1/2.0 and a Python Script). These
applications are scheduled via a scheduling program to go away and
"screen scrape" information at a specified time.

All information is then logged into a centralized database so the data
can be used at a later date.

Database wise we are using MSSQL 2005

TIA
Markus
Nick Malik [Microsoft] wrote:
>adapters, agents, and messageware.

I've done this a couple of times so far. I'll need to know more about
the
technologies you are working with to help more, though.

What your environment look like? Do you have Biztalk or an ESB running
yet?
What time requirements do you have for the data?

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
<Ma*******@gmail.comwrote in message
news:11**********************@m73g2000cwd.googleg roups.com...
Hi, we have some datafeeds which pull info from external sources.
Unfortunately, we have to use screen scraping as there are no XML
feeds. The data feeds are located in a variety of different
applications located on different servers. I have to design a new
architecture, I have a fair idea of how I would do it but if anyone has
any pointers to a good existing architecure design or *things not to
do*, please post.

TIA
Markus
===================
googlenews2006markusj

Oct 2 '06 #5
Hi Nick, thanks for your help
Please see below
Nick Malik [Microsoft] wrote:
Hi Markus,

From an architectural perspective, you have applications that draw data
using screen scraping. They interpret that data and store it in a database.
Part of what I need to know: how up to date does the data need to be?
The import is done on a daily basis, so information only needs to be
updated once a day from the existing data sources. Reports etc are
viewed against this information all day long from many different
sources (Web pages, applications etc)
>
Example:
Contoso Marine Supply is a catalog provider of small parts and fittings for
boaters. They have a Mainframe application, written in CICS, that is used
to enter catalog orders that arrive via a mail processing center.

At any time, the company employees can see the list of invoices that need to
be sent to the customer via a CICS screen on an IBM 3270 terminal.

If the system that prints and sends the invoices is on the Windows platform,
then it makes sense that the data is pulled periodically (perhaps nightly?)
and if a new invoice is found, then the necessary data is stored for
printing. We could also say that we print invoices twice a week.

In this scenario, the data needs to get to the Windows application twice a
week. We pull the data more often, which adds a level of *reliability*
(because if the mainframe or the windows server app are not running on
Tuesday at midnight, you can still pull the data on Wednesday for Thursday's
print run... this serves the reliable delivery of data).

A different scenario may be if the Windows server application is a Partner
Relationship Management system. In that case, the PRM system needs to know
about the orders as soon as they are entered, because a salesman may be
about to call on a particular supplier, and they need accurate and
up-to-date information about the orders that are coming through for their
parts. In this case, the time requirements would be pretty much 'as soon as
humanly possible' (I like the term "near real time").

So I'm asking about the time requirements. You've got some of the
picture... you have apps that pull data. Cool. What data do they pull and
why do they pull it? That's pretty important info if I'm going to be
helpful.

ESB = Enterprise Services Bus.

Please tell me what type of app you are screen scraping (CICS, UNIX, AS/400,
what?).
It's just a external website. We just parse the HTML, retrieve the
information we need and update the database.
>

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
<Ma*******@gmail.comwrote in message
news:11*********************@c28g2000cwb.googlegro ups.com...
Hi Nick, we do not have BizTalk and I'm not too sure what you mean by
ESB sorry.

Basically we have a number of distributed applications on a variety of
platforms (Classic ASP, .NET 1.1/2.0 and a Python Script). These
applications are scheduled via a scheduling program to go away and
"screen scrape" information at a specified time.

All information is then logged into a centralized database so the data
can be used at a later date.

Database wise we are using MSSQL 2005

TIA
Markus
Nick Malik [Microsoft] wrote:
adapters, agents, and messageware.

I've done this a couple of times so far. I'll need to know more about
the
technologies you are working with to help more, though.

What your environment look like? Do you have Biztalk or an ESB running
yet?
What time requirements do you have for the data?

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
<Ma*******@gmail.comwrote in message
news:11**********************@m73g2000cwd.googlegr oups.com...
Hi, we have some datafeeds which pull info from external sources.
Unfortunately, we have to use screen scraping as there are no XML
feeds. The data feeds are located in a variety of different
applications located on different servers. I have to design a new
architecture, I have a fair idea of how I would do it but if anyone has
any pointers to a good existing architecure design or *things not to
do*, please post.

TIA
Markus
===================
googlenews2006markusj
Oct 2 '06 #6
<Ma*******@gmail.comwrote in message
news:11**********************@k70g2000cwa.googlegr oups.com...
Hi Nick, thanks for your help
Please see below

The import is done on a daily basis, so information only needs to be
updated once a day from the existing data sources. Reports etc are
viewed against this information all day long from many different
sources (Web pages, applications etc)
>>
Please tell me what type of app you are screen scraping (CICS, UNIX,
AS/400,
what?).

It's just a external website. We just parse the HTML, retrieve the
information we need and update the database.
My prior responses were overkill.

For your architecture, I would suggest that you create an app with two basic
abilities:
1. the ability to specify as many target data pages as you want in an XML
file. That way, if you want to expand the list of pages you want to pull
data from, or if the information provider decides to break the information
up onto multiple pages, you can adapt quickly.

2. the ability to define what data you want from your target page, and how
to find it on the target page, using an XML description. That way, when
the target page changes in formatting or coding, you don't have to change
your C# code to allow you to get your data again.
I would suggest that you run your app as a service that runs nightly. I
notice that you posted your question to the ASP.Net newsgroup, so it is
possible that you are familiar only with creating web apps. Writing a
service is different, but not terribly difficult. Suggestion: Create a
command line utility that will do the work of pulling the data. Then either
write a service to call your command line utility, or simply schedule your
command line utility with the scheduling service in Windows. That makes it
easier to write and debug your code. Keep in mind that your app needs to
run without calling a user interface of any kind. No input from console, no
output to console (except debugging messages).

Using a service will make it much easier to reliably get the data you want,
and you can change the frequency by which you pull data by simply changing
the scheduler or your service code.

Hope this helps.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
Oct 6 '06 #7
Thanks for your help Nick
Regards
Markus
Nick Malik [Microsoft] wrote:
<Ma*******@gmail.comwrote in message
news:11**********************@k70g2000cwa.googlegr oups.com...
Hi Nick, thanks for your help
Please see below

The import is done on a daily basis, so information only needs to be
updated once a day from the existing data sources. Reports etc are
viewed against this information all day long from many different
sources (Web pages, applications etc)
>
Please tell me what type of app you are screen scraping (CICS, UNIX,
AS/400,
what?).
It's just a external website. We just parse the HTML, retrieve the
information we need and update the database.

My prior responses were overkill.

For your architecture, I would suggest that you create an app with two basic
abilities:
1. the ability to specify as many target data pages as you want in an XML
file. That way, if you want to expand the list of pages you want to pull
data from, or if the information provider decides to break the information
up onto multiple pages, you can adapt quickly.

2. the ability to define what data you want from your target page, and how
to find it on the target page, using an XML description. That way, when
the target page changes in formatting or coding, you don't have to change
your C# code to allow you to get your data again.
I would suggest that you run your app as a service that runs nightly. I
notice that you posted your question to the ASP.Net newsgroup, so it is
possible that you are familiar only with creating web apps. Writing a
service is different, but not terribly difficult. Suggestion: Create a
command line utility that will do the work of pulling the data. Then either
write a service to call your command line utility, or simply schedule your
command line utility with the scheduling service in Windows. That makes it
easier to write and debug your code. Keep in mind that your app needs to
run without calling a user interface of any kind. No input from console, no
output to console (except debugging messages).

Using a service will make it much easier to reliably get the data you want,
and you can change the frequency by which you pull data by simply changing
the scheduler or your service code.

Hope this helps.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
Oct 8 '06 #8

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

6 posts views Thread by Sebastian Kemi | last post: by
41 posts views Thread by laimis | last post: by
7 posts views Thread by billy | last post: by
1 post views Thread by jcsnippets.atspace.com | last post: by
2 posts views Thread by rob | last post: by
1 post views Thread by caine | last post: by
21 posts views Thread by koolj96825 | last post: by
5 posts views Thread by jerry | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.