473,888 Members | 1,554 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Data Feed architecture

Hi, we have some datafeeds which pull info from external sources.
Unfortunately, we have to use screen scraping as there are no XML
feeds. The data feeds are located in a variety of different
applications located on different servers. I have to design a new
architecture, I have a fair idea of how I would do it but if anyone has
any pointers to a good existing architecure design or *things not to
do*, please post.

TIA
Markus
=============== ====
googlenews2006m arkusj

Oct 1 '06 #1
7 2352

The Microsoft Patterns & Practices website has some good guidelines for
architecture design:

http://msdn.microsoft.com/practices/
Ma*******@gmail .com wrote:
Hi, we have some datafeeds which pull info from external sources.
Unfortunately, we have to use screen scraping as there are no XML
feeds. The data feeds are located in a variety of different
applications located on different servers. I have to design a new
architecture, I have a fair idea of how I would do it but if anyone has
any pointers to a good existing architecure design or *things not to
do*, please post.

TIA
Markus
=============== ====
googlenews2006m arkusj
Oct 1 '06 #2
adapters, agents, and messageware.

I've done this a couple of times so far. I'll need to know more about the
technologies you are working with to help more, though.

What your environment look like? Do you have Biztalk or an ESB running yet?
What time requirements do you have for the data?

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
<Ma*******@gmai l.comwrote in message
news:11******** **************@ m73g2000cwd.goo glegroups.com.. .
Hi, we have some datafeeds which pull info from external sources.
Unfortunately, we have to use screen scraping as there are no XML
feeds. The data feeds are located in a variety of different
applications located on different servers. I have to design a new
architecture, I have a fair idea of how I would do it but if anyone has
any pointers to a good existing architecure design or *things not to
do*, please post.

TIA
Markus
=============== ====
googlenews2006m arkusj

Oct 1 '06 #3
Hi Nick, we do not have BizTalk and I'm not too sure what you mean by
ESB sorry.

Basically we have a number of distributed applications on a variety of
platforms (Classic ASP, .NET 1.1/2.0 and a Python Script). These
applications are scheduled via a scheduling program to go away and
"screen scrape" information at a specified time.

All information is then logged into a centralized database so the data
can be used at a later date.

Database wise we are using MSSQL 2005

TIA
Markus
Nick Malik [Microsoft] wrote:
adapters, agents, and messageware.

I've done this a couple of times so far. I'll need to know more about the
technologies you are working with to help more, though.

What your environment look like? Do you have Biztalk or an ESB running yet?
What time requirements do you have for the data?

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
<Ma*******@gmai l.comwrote in message
news:11******** **************@ m73g2000cwd.goo glegroups.com.. .
Hi, we have some datafeeds which pull info from external sources.
Unfortunately, we have to use screen scraping as there are no XML
feeds. The data feeds are located in a variety of different
applications located on different servers. I have to design a new
architecture, I have a fair idea of how I would do it but if anyone has
any pointers to a good existing architecure design or *things not to
do*, please post.

TIA
Markus
=============== ====
googlenews2006m arkusj
Oct 1 '06 #4
Hi Markus,

From an architectural perspective, you have applications that draw data
using screen scraping. They interpret that data and store it in a database.
Part of what I need to know: how up to date does the data need to be?

Example:
Contoso Marine Supply is a catalog provider of small parts and fittings for
boaters. They have a Mainframe application, written in CICS, that is used
to enter catalog orders that arrive via a mail processing center.

At any time, the company employees can see the list of invoices that need to
be sent to the customer via a CICS screen on an IBM 3270 terminal.

If the system that prints and sends the invoices is on the Windows platform,
then it makes sense that the data is pulled periodically (perhaps nightly?)
and if a new invoice is found, then the necessary data is stored for
printing. We could also say that we print invoices twice a week.

In this scenario, the data needs to get to the Windows application twice a
week. We pull the data more often, which adds a level of *reliability*
(because if the mainframe or the windows server app are not running on
Tuesday at midnight, you can still pull the data on Wednesday for Thursday's
print run... this serves the reliable delivery of data).

A different scenario may be if the Windows server application is a Partner
Relationship Management system. In that case, the PRM system needs to know
about the orders as soon as they are entered, because a salesman may be
about to call on a particular supplier, and they need accurate and
up-to-date information about the orders that are coming through for their
parts. In this case, the time requirements would be pretty much 'as soon as
humanly possible' (I like the term "near real time").

So I'm asking about the time requirements. You've got some of the
picture... you have apps that pull data. Cool. What data do they pull and
why do they pull it? That's pretty important info if I'm going to be
helpful.

ESB = Enterprise Services Bus.

Please tell me what type of app you are screen scraping (CICS, UNIX, AS/400,
what?).
--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
<Ma*******@gmai l.comwrote in message
news:11******** *************@c 28g2000cwb.goog legroups.com...
Hi Nick, we do not have BizTalk and I'm not too sure what you mean by
ESB sorry.

Basically we have a number of distributed applications on a variety of
platforms (Classic ASP, .NET 1.1/2.0 and a Python Script). These
applications are scheduled via a scheduling program to go away and
"screen scrape" information at a specified time.

All information is then logged into a centralized database so the data
can be used at a later date.

Database wise we are using MSSQL 2005

TIA
Markus
Nick Malik [Microsoft] wrote:
>adapters, agents, and messageware.

I've done this a couple of times so far. I'll need to know more about
the
technologies you are working with to help more, though.

What your environment look like? Do you have Biztalk or an ESB running
yet?
What time requirements do you have for the data?

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representati ve of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
<Ma*******@gma il.comwrote in message
news:11******* *************** @m73g2000cwd.go oglegroups.com. ..
Hi, we have some datafeeds which pull info from external sources.
Unfortunately, we have to use screen scraping as there are no XML
feeds. The data feeds are located in a variety of different
applications located on different servers. I have to design a new
architecture, I have a fair idea of how I would do it but if anyone has
any pointers to a good existing architecure design or *things not to
do*, please post.

TIA
Markus
=============== ====
googlenews2006m arkusj

Oct 2 '06 #5
Hi Nick, thanks for your help
Please see below
Nick Malik [Microsoft] wrote:
Hi Markus,

From an architectural perspective, you have applications that draw data
using screen scraping. They interpret that data and store it in a database.
Part of what I need to know: how up to date does the data need to be?
The import is done on a daily basis, so information only needs to be
updated once a day from the existing data sources. Reports etc are
viewed against this information all day long from many different
sources (Web pages, applications etc)
>
Example:
Contoso Marine Supply is a catalog provider of small parts and fittings for
boaters. They have a Mainframe application, written in CICS, that is used
to enter catalog orders that arrive via a mail processing center.

At any time, the company employees can see the list of invoices that need to
be sent to the customer via a CICS screen on an IBM 3270 terminal.

If the system that prints and sends the invoices is on the Windows platform,
then it makes sense that the data is pulled periodically (perhaps nightly?)
and if a new invoice is found, then the necessary data is stored for
printing. We could also say that we print invoices twice a week.

In this scenario, the data needs to get to the Windows application twice a
week. We pull the data more often, which adds a level of *reliability*
(because if the mainframe or the windows server app are not running on
Tuesday at midnight, you can still pull the data on Wednesday for Thursday's
print run... this serves the reliable delivery of data).

A different scenario may be if the Windows server application is a Partner
Relationship Management system. In that case, the PRM system needs to know
about the orders as soon as they are entered, because a salesman may be
about to call on a particular supplier, and they need accurate and
up-to-date information about the orders that are coming through for their
parts. In this case, the time requirements would be pretty much 'as soon as
humanly possible' (I like the term "near real time").

So I'm asking about the time requirements. You've got some of the
picture... you have apps that pull data. Cool. What data do they pull and
why do they pull it? That's pretty important info if I'm going to be
helpful.

ESB = Enterprise Services Bus.

Please tell me what type of app you are screen scraping (CICS, UNIX, AS/400,
what?).
It's just a external website. We just parse the HTML, retrieve the
information we need and update the database.
>

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
<Ma*******@gmai l.comwrote in message
news:11******** *************@c 28g2000cwb.goog legroups.com...
Hi Nick, we do not have BizTalk and I'm not too sure what you mean by
ESB sorry.

Basically we have a number of distributed applications on a variety of
platforms (Classic ASP, .NET 1.1/2.0 and a Python Script). These
applications are scheduled via a scheduling program to go away and
"screen scrape" information at a specified time.

All information is then logged into a centralized database so the data
can be used at a later date.

Database wise we are using MSSQL 2005

TIA
Markus
Nick Malik [Microsoft] wrote:
adapters, agents, and messageware.

I've done this a couple of times so far. I'll need to know more about
the
technologies you are working with to help more, though.

What your environment look like? Do you have Biztalk or an ESB running
yet?
What time requirements do you have for the data?

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
<Ma*******@gmai l.comwrote in message
news:11******** **************@ m73g2000cwd.goo glegroups.com.. .
Hi, we have some datafeeds which pull info from external sources.
Unfortunately, we have to use screen scraping as there are no XML
feeds. The data feeds are located in a variety of different
applications located on different servers. I have to design a new
architecture, I have a fair idea of how I would do it but if anyone has
any pointers to a good existing architecure design or *things not to
do*, please post.

TIA
Markus
=============== ====
googlenews2006m arkusj
Oct 2 '06 #6
<Ma*******@gmai l.comwrote in message
news:11******** **************@ k70g2000cwa.goo glegroups.com.. .
Hi Nick, thanks for your help
Please see below

The import is done on a daily basis, so information only needs to be
updated once a day from the existing data sources. Reports etc are
viewed against this information all day long from many different
sources (Web pages, applications etc)
>>
Please tell me what type of app you are screen scraping (CICS, UNIX,
AS/400,
what?).

It's just a external website. We just parse the HTML, retrieve the
information we need and update the database.
My prior responses were overkill.

For your architecture, I would suggest that you create an app with two basic
abilities:
1. the ability to specify as many target data pages as you want in an XML
file. That way, if you want to expand the list of pages you want to pull
data from, or if the information provider decides to break the information
up onto multiple pages, you can adapt quickly.

2. the ability to define what data you want from your target page, and how
to find it on the target page, using an XML description. That way, when
the target page changes in formatting or coding, you don't have to change
your C# code to allow you to get your data again.
I would suggest that you run your app as a service that runs nightly. I
notice that you posted your question to the ASP.Net newsgroup, so it is
possible that you are familiar only with creating web apps. Writing a
service is different, but not terribly difficult. Suggestion: Create a
command line utility that will do the work of pulling the data. Then either
write a service to call your command line utility, or simply schedule your
command line utility with the scheduling service in Windows. That makes it
easier to write and debug your code. Keep in mind that your app needs to
run without calling a user interface of any kind. No input from console, no
output to console (except debugging messages).

Using a service will make it much easier to reliably get the data you want,
and you can change the frequency by which you pull data by simply changing
the scheduler or your service code.

Hope this helps.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
Oct 6 '06 #7
Thanks for your help Nick
Regards
Markus
Nick Malik [Microsoft] wrote:
<Ma*******@gmai l.comwrote in message
news:11******** **************@ k70g2000cwa.goo glegroups.com.. .
Hi Nick, thanks for your help
Please see below

The import is done on a daily basis, so information only needs to be
updated once a day from the existing data sources. Reports etc are
viewed against this information all day long from many different
sources (Web pages, applications etc)
>
Please tell me what type of app you are screen scraping (CICS, UNIX,
AS/400,
what?).
It's just a external website. We just parse the HTML, retrieve the
information we need and update the database.

My prior responses were overkill.

For your architecture, I would suggest that you create an app with two basic
abilities:
1. the ability to specify as many target data pages as you want in an XML
file. That way, if you want to expand the list of pages you want to pull
data from, or if the information provider decides to break the information
up onto multiple pages, you can adapt quickly.

2. the ability to define what data you want from your target page, and how
to find it on the target page, using an XML description. That way, when
the target page changes in formatting or coding, you don't have to change
your C# code to allow you to get your data again.
I would suggest that you run your app as a service that runs nightly. I
notice that you posted your question to the ASP.Net newsgroup, so it is
possible that you are familiar only with creating web apps. Writing a
service is different, but not terribly difficult. Suggestion: Create a
command line utility that will do the work of pulling the data. Then either
write a service to call your command line utility, or simply schedule your
command line utility with the scheduling service in Windows. That makes it
easier to write and debug your code. Keep in mind that your app needs to
run without calling a user interface of any kind. No input from console, no
output to console (except debugging messages).

Using a service will make it much easier to reliably get the data you want,
and you can change the frequency by which you pull data by simply changing
the scheduler or your service code.

Hope this helps.

--
--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster
http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
--
Oct 8 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
23614
by: Sebastian Kemi | last post by:
How should a write a class to a file? Would this example work: object *myobject = 0; tfile.write(reinterpret_cast<char *>(myobject), sizeof(*object)); / sebek
41
4748
by: laimis | last post by:
Hey guys, I just recently got introduced to data mappers (DTO mapper). So now I have a SqlHelper being used by DTOMapper and then business layer is using DTOMapper when it needs to persist object to database or load them back. Everything is working nicely so far. My question is, is it OK practice to use DTOMapper rfom the presentation layer? For instance, if I want to present in HTML format the list of entries in my database, should I...
7
2636
by: billy | last post by:
Hello, I have an XML data feed that I would like to use to create tables in SQL Server. The xml data feed consists of a large amount of information that changes on a regualar basis. Is there a way to automatically create SQL Server tables using the data feed? Thanks Billy
1
2342
by: jcsnippets.atspace.com | last post by:
Hi everyone, I'm trying to read an xml config file in C#. To do this, I have used the xsd.exe tool to create an xsd file from my xml, and to generate code to read this xml file. Here is a small sample of my xml file: <feed> <location> <directory>data</directory>
2
1569
by: rob | last post by:
I'm looking for the best way to do minimal (probably 1 MB or so) updates to existing data, by having the client app monitor a live feed. That data feed will periodically broadcast the update packet to all clients. I suspect that this is done often enough that there is a somewhat standard method, or maybe a toolkit. How is this usually solved? Thanks
1
2680
by: caine | last post by:
I want to extract web data from a news feed page http://everling.nierchi.net/mmubulletins.php. Just want to extract necessary info between open n closing tags of <title>, <categoryand <link>. Whenever I initiated the extraction, first news title is always "MMU Bulletin Board RSS Feed" with the proper bulletin's link stored, but not the correct news title being stored. Necessary info only appears within <itemand </itemwhich consists...
21
2316
by: koolj96825 | last post by:
Hi, I've been working on this project for the past few months in my spare time, and now I started working on the windows interface and going over my petzold book, I've come to the realization that an int could be 32-bit for PCs. Oh, I could kick myself for not checking good in the beginning, but the manual for the compiler I am using says int is 16-bit. It may be out of date. Anyway, now that I need to go back over and look closely...
5
1607
by: jerry | last post by:
Ok, I have a feed that I need to get the data from. The file is an html file that contains items. I need to step through the file, getting the first item, doing whatever, then the second item, and so on. Any thoughts on the best way to to this? <Feed> <Item> <ID>Data I need</ID> <Title>Data I need</Title> </Item>
7
28918
Merlin1857
by: Merlin1857 | last post by:
Its great producing data for users to look at in your web pages and generally that is sufficient for their needs but sometimes you may want to supply your user with the data in a form they can actually do something more with. This code shows you how to display data from your database and then how to give that data to the user in the form of a useable Excel spreadsheet which they can then take away and play with themselves. The way I have shown...
0
9961
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9800
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10778
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10886
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9597
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7990
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5824
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4642
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4245
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.