**THE PROBLEM**

**I can see that my hard-drive is working very hard throughout the operation.**The operation took about 2 hours (I just canceled it at this point, looked like it was about 10% done) with 200,000 vertices and an average of 50 edges per vertex (essentially, it had to insert about 12 million rows).

**How can I speed this up?**Is there a clever way to get SQL Server to do all the operations in RAM, and then just copy the data from RAM to disk in one contiguous stream after all the inserting is done?

I am developing on a 4gb RAM laptop, and the database should easily fit entirely into RAM several times over--I estimate that my "test" script would generate a 150mbyte database.

**THE BACKGROUND**

Each edge represents a uni-directional connection between two points (vertices) on the graph). So if point 10 connects to point 23 in a graph, there would be an "edge" entry in the database containing "10" and "23".

If a vertex/point is isolated (it connects to nothing), it is stored as an edge which "connects" to itself: i.e. if point 56 connects to no other points, then there would be an "edge" entry in the database containing "56" and "56".

**THE SCHEMA**

Expand|Select|Wrap|Line Numbers

- CREATE TABLE [dbo].[edges](
- [edge_id] [int] IDENTITY(1,1) NOT NULL,
- [graph_id] [int] NOT NULL,
- [primary_vertex_id] [int] NOT NULL,
- [adjacent_vertex_id] [int] NULL CONSTRAINT [DF_edges_adjacent_vertex_id] DEFAULT (NULL),
- CONSTRAINT [PK_edges] PRIMARY KEY CLUSTERED
- (
- [edge_id] ASC
- )WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
- ) ON [PRIMARY]

**THE SCRIPT THAT TAKES TOO LONG**

Expand|Select|Wrap|Line Numbers

- DECLARE @graph_id INT
- DECLARE @vertices INT
- DECLARE @average_edges INT
- SET @graph_id = 35
- SET @vertices = 200000 /* 200,000 vertex graph */
- SET @average_edges = 50; /* approx 50 connections per vertex */
- BEGIN
- /* get the first new vertex id -- should usually be 1, but just in case there's already some vertices in graph */
- DECLARE @first_vertex_id INT
- SELECT @first_vertex_id = (1 + isnull(max(dbo.edges.primary_vertex_id),0)) FROM dbo.edges WHERE dbo.edges.graph_id = @graph_id;
- BEGIN
- DECLARE @counter INT
- DECLARE @new_vertex_id INT
- SET @counter = 0
- WHILE @counter < @vertices
- BEGIN
- SET @new_vertex_id = @first_vertex_id + @counter;
- SET @counter = @counter + 1;
- /* insert a partial edge with the new vertex id */
- INSERT INTO dbo.edges(graph_id,
- primary_vertex_id,
- adjacent_vertex_id)
- VALUES(@graph_id, @new_vertex_id, @new_vertex_id);
- END;
- RETURN 0;
- END;
- END;
- BEGIN
- DECLARE @lower INT
- DECLARE @upper INT
- SELECT @lower = min(dbo.edges.primary_vertex_id) FROM dbo.edges WHERE dbo.edges.graph_id = @graph_id
- SELECT @upper = max(dbo.edges.primary_vertex_id) FROM dbo.edges WHERE dbo.edges.graph_id = @graph_id
- /* note: this algorithm assumes that the vertex_ids are CONTIGUOUS!! and may create duplicate edges */
- BEGIN
- DECLARE @loop_max INT
- DECLARE @counter INT
- DECLARE @rn_one INT
- DECLARE @rn_two INT
- SET @counter = 0
- SELECT @loop_max = (@upper - @lower + 1) * @average_edges
- WHILE @counter < @loop_max
- BEGIN
- /* pick two random vertex IDs that exist on the graph */
- SELECT @rn_one = Round(((@Upper - @Lower -1) * Rand() + @Lower), 0);
- SELECT @rn_two = Round(((@Upper - @Lower -1) * Rand() + @Lower), 0);
- /* insert a bi-directional edge for those two vertices */
- INSERT INTO dbo.edges(graph_id,
- primary_vertex_id,
- adjacent_vertex_id)
- VALUES(@graph_id, @rn_one, @rn_two);
- INSERT INTO dbo.edges(graph_id,
- primary_vertex_id,
- adjacent_vertex_id)
- VALUES(@graph_id, @rn_two, @rn_one);
- SET @counter = @counter + 1;
- END
- END;
- END;