Running queries on large data structure

Christoph Haas

Hi, list...

I have written an application in Perl some time ago (I was young and needed
the money) that parses multiple large text files containing nested data
structures and allows the user to run quick queries on the data.
(For the firewall admins among you: it's a parser and web-based query tool
for CheckPoint firewall rulebases. The user can search for source and
destination IPs and get the matching rules.)

The current application consists of two parts:

(1) An importing process that reads and parses the large text files and
writes the data in different PostgreSQL tables.
(2) A web (CGI) interface that allows the user to query the collected
data from the PostgreSQL database by different criteria.
(I don't like PostgreSQL much due to the lack of decent tools like
phpmyadmin. Pgadmin3 and Phppgadmin don't give me the feeling that
I control the database. More the other way round. But PostgreSQL
has a nice 'inet' data type that allows for quick matches in tables
of IP addresses and networks.)

However the information in the (relational) database was stored in a
horribily artificial way. The SQL query is a 20-line monster with UNIONs
and LEFT JOINs and negations. It's lightning fast (0.5 seconds to search
over a 500 set consisting of complex rules) but neither the source code
nor the database is easy to handle any more. And I'd like to have more
flexibility in the kind of queries I run. So I'd like to trade the good
speed by some readability and a simpler - more object-oriented - data
structure.

I'm currently thinking of different ways to handle that but would like to
get some opinions about that:

(a) See what sqlalchemy can do for me to handle the object-relational
transformation and basically stay with PostgreSQL.
(b) Parse the input files into one large nested Python data structure.
Then write this structure to a file using "marshal" or "repr".
Then I have a very clean source code like
for rule in rules:
for src in sources:
if searched_for_src == src...
(c) ...?

What makes PostgreSQL less suited is the fact that CheckPoint rule bases
can contain several complex objects:
- Hosts (easy, they are just one IP address and can easily be compared)
- Networks (nearly as easy - just see if the IP is part of the network)
- Groups (slightly harder; can even be nested and contain other groups
and hosts or networks)
- IP ranges (10.0.0.50-10.5.25.100; not easy to parse either)

I would even like to allow the users more complex queries like multiple
search conditions. The query would be something like "show me all matching
firewall rules where 10.0.0.5 matches the source column and 192.168.42.1
matches the destination column OR any rule where the group 'internal hosts'
is mentioned in the destination column". It sounds like a database is the
right job. But somehow a database is also not flexible enough. And the
data is small enough (1 MB probably) that it can be read into memory.

What do you think would be the right tool for the job? Thanks for sharing
your thoughts.

Christoph

Aug 2 '06 #1

Subscribe Post Reply

1578

Similar topics

Views vs Queries

by: Mike N. | last post by:

Hello: I am new to T-SQL programing, and relativly new to SQL statements in general, although I have a good understanding of database theory. I'm a little confused as to the fundamental...

Microsoft SQL Server

parameterized queries running slower than non-parameterized queries

by: gary b | last post by:

Hello When I use a PreparedStatement (in jdbc) with the following query: SELECT store_groups_id FROM store_groups WHERE store_groups_id IS NOT NULL AND type = ? ORDER BY group_name

Microsoft SQL Server

ADO client disconnects after running a long query

by: Gary | last post by:

I am having a problem executing long running queries from an ASP application which connects to SQL Server 2000. Basically, I have batches of queries that are run using ADO in a loop written in...

Microsoft SQL Server

Speeding up select queries on table with 3million rows.

by: mfyahya | last post by:

Hi, I'm new to databases :) I need help speeding up select queries on my data which are currently taking 4-5 seconds. I set up a single large table of coordinates data with an index on the fields...

MySQL Database

Replacing data in a table WITHOUT changing the numbers created from a running sum..

by: StenKoll | last post by:

Help needed in order to create a register of stocks in a company. In accordance with local laws I need to give each individual share a number. I have accomplished this by establishing three tables...

Microsoft Access / VBA

Internal limitation: structure is too complex or too large.

by: Kirk Marple | last post by:

i have a large C++ data structure that i'm trying to interop with... the structure is 77400 bytes long. i have the structure defined in C#, so i was trying to just use "ref <structure>" as the...

C# / C Sharp

fill dataset/grid with multiple queries from multiple servers

by: Dave Edwards | last post by:

I understand that I can fill a datagrid with multiple queries, but I cannot figure out how to fill a dataset with the same query but run against multiple SQL servers, the query , table structure...

Visual Basic .NET

Passing large amounts of data between classes/functions

by: Macca | last post by:

Hi, I'm writing an application that will pass a large amount of data between classes/functions. In C++ it was more efficient to send a pointer to the object, e.g structure rather than passing...

C# / C Sharp

xslt queries in xml to SQL queries

by: Ian Roddis | last post by:

Hello, I want to embed SQL type queries within an XML data record. The XML looks something like this: <DISPLAYPAGE> <FIELD NAME="SERVER" TYPE="DROPDOWN"> <OPTION>1<OPTION> <OPTION>2<OPTION>...

Python

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

Access Europe: Command bars, the Access Shortcut Tool and a simple Audit Log - Wed 3 April

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

General

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++