473,406 Members | 2,439 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Implementing a Sequence Counter for multi-threaded crawler

I'm crawling a site and storing links to be crawled in a table. I have multiple threads running at the same time. What I'd like is for a thread to be able to get the primary key of the next unprocessed link in a single command and run with it.

I thought I could use a sequence counter, but I can't see a way to both increment the counter and return the value in a single command. Is there a way to do this? I want to avoid a race condition.
Aug 20 '07 #1
4 2350
pbmods
5,821 Expert 4TB
Changed thread title to better describe the problem (did you know that threads whose titles contain three words or less actually get FEWER responses?).

Heya, formido. Welcome to TSDN!

You could do something like this:
Expand|Select|Wrap|Line Numbers
  1. SELECT @counter := @counter + 1
  2.  
Aug 20 '07 #2
Heh. Nope, never heard that. Awesome.

Regarding the code:

I've never programmed an RDBMS before, so I don't know all the ins-and-outs yet. It appears that I'd put that in a stored procedure and then call it from PHP?

I had thought there was some sort of feature called a 'sequence' that you could attach to a table or row. Your solution doesn't use this, but instead just increments a variable, right? If one thread increments the variable, will another thread see the new value if it accesses the same variable.

Now that I think of it, I had another question about 'sequences', which is that I thought I read that they were special to clients, so incrementing them wouldn't be visible to other clients. I assume each thread is considered a separate client?
Aug 20 '07 #3
pbmods
5,821 Expert 4TB
Heya, formido.

If you are inserting data into a table, you'll like MySQL's auto_increment feature:

Expand|Select|Wrap|Line Numbers
  1. CREATE TABLE
  2.         `sequences`
  3.         (
  4.             `sequenceid`
  5.                 BIGINT(20)
  6.                 UNSIGNED
  7.                 NOT NULL
  8.                 AUTO_INCREMENT
  9.                 PRIMARY KEY,
  10.             .
  11.             .
  12.             .
  13.         )
  14.  
Aug 20 '07 #4
No, yeah, it's great, but it doesn't quite solve my problem, I think. See, when one of the spider PHP processes needs a new link it gets it out of the links table. There has to be a bullet proof way for any process to request the next unprocessed link in the links table. I thought the best way would be for a SQL maintained counter. A thread would ask SQL for the next number, and SQL would take care of incrementing the counter for the next process that requested it. That number would then be used on the autoincremented primary key column of the links table to retrieve the next URL.
Aug 20 '07 #5

Sign in to post your reply or Sign up for a free account.

Similar topics

4
by: Paul Rubin | last post by:
I have a file with contents like: Vegetable: spinach Fruit: banana Flower: Daisy Fruit: pear
6
by: Martyn Lawson | last post by:
Hi, I am currently working as an Analyst on a .NET Web Project using ASP.NET and C#.NET. I have a couple of, at least what should be, quick questions: 1. My understanding of UML says that...
3
by: Robert McGregor | last post by:
Hi there, I was wondering if anyone could help with this problem. I have a table with about 250,000 rows that relate to files that have been processed elsewhere in our business. Each file has...
15
by: Bernard | last post by:
Hi All, I am not sure if I should be asking this question on clc or clc++. Let me try on both. I hope that this is not too trivial for the brilliant minds over here. I know that OOP questions...
3
by: David Garamond | last post by:
Am I correct to assume that SERIAL does not guarantee that a sequence won't skip (e.g. one successful INSERT gets 32 and the next might be 34)? Sometimes a business requirement is that a serial...
5
by: Shastri | last post by:
Hi all, I was trying to implement chmod command(UNIX) in c. While passing arguments to the main function like : -r for reading the file -w for writing the file How can I compare the...
6
by: newtophp2000 | last post by:
Hello, Since SQL Server has no sequence generator, I wrote my own. (I claim no ownership of it as it is closely modeled after earlier discussions on this topic.) I have included the sql...
2
by: IloChab | last post by:
I'd like to implement an object that represents a circular counter, i.e. an integer that returns to zero when it goes over it's maxVal. This counter has a particular behavior in comparison: if I...
9
by: raylopez99 | last post by:
What's the best way of implementing a multi-node tree in C++? What I'm trying to do is traverse a tree of possible chess moves given an intial position (at the root of the tree). Since every...
5
by: coleslaw01 | last post by:
Hello, I am trying to teach myself C++ while babysitting a stable network in Iraq and have put together a program to display the fibonacci sequence. It works with long and long double(output in...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.