Keeping track of what a user has read on a web site

Sandman

Just looking for suggestion on how to do this in my Web application.

The goal is to keep track of what a user has and hasn't read and present him or
her with new material

I am currently doing this by aggregating new content from all databases into a
single indexed database and then saving a timestamp in the account database
(for the current user) that tells me when the user last read items in the
aggregated database.

This works as designed, but I don't have detailed control. If the user opens
the page where new items are listed and reads on of the items (and don't reset
the timestamp), that item won't be removed from the list.

This is evident when I want to have a function in my page that alerts the user
if there is new content of a specific kind (for example: "1 new articles on
cooking"). That function will report that until the user has reset the
timestamp (by visiting the 'what's new?' page that lists all new articles). I
can't mark just this article as 'read'.

So, what options do I have? Well, each item have an ID, so if I should keep
track of read/unread I should base that on the IDs in the aggregation database.

Looking at how newsreaders (specifically those that make use of a .newsrc file)
do it, they keep track of series of ID's, like "12,14-67,69" - which in my case
could mean that the user has read the items with ID 13 and 68.
The aggregation database looks something like this:

ID | Kind | Headline | Original ID
------+-------------+-------------------------------+------------
1 | article | Home made pie | 23
2 | article | Hamburgers a'plenty | 24
3 | forum | Anyone likes strawberries? | 298
4 | comments |*Re: Home made pie | 67

Get the idea? The ID is the id in the aggregated database, the kind is from
what original database the content came from and the original ID is the id in
that database

So, if I go and read "Hamburgers a'plenty", it should perhaps update my profile
to say "1,3-4" or somesuch to note that I have read id number 2. Or perhaps I
should just keep track of all the IDs I have read? The aggregate database keeps
content around for about a month, which could mean thousands of items.

I am guessing that a MySQL query that looked like this:

"select * from aggregate where id not in(1,2,3,4,5,6,7,8.....1678)"

would be bad.
So, I am wondering how YOU would have done - Or are you already doing this in
one way or the other? I'm just venting here and hoping that someone will come
with good suggestions on how to solve this in an efficient manner.

One way - I suppose - would be to make on SQL query that fetches all the
(potentially thousands) of ID's and all the headlines and puts them in an array
and then I run that array in a PHP function that filters out items based on the
"1,4-7,12-78" filter. I would also have to build PHP functions to maintain
these lists, but that's a minor problem.

Anyway, any thoughts or suggestions are welcome.

--
Sandman[.net]

Aug 30 '05 #1

Subscribe Post Reply

3062

Gordon Burditt

>The goal is to keep track of what a user has and hasn't read and present him or

her with new material

I am currently doing this by aggregating new content from all databases into a
single indexed database and then saving a timestamp in the account database
(for the current user) that tells me when the user last read items in the
aggregated database.

This works as designed, but I don't have detailed control. If the user opens
the page where new items are listed and reads on of the items (and don't reset
the timestamp), that item won't be removed from the list.
This is one of the major limitations of timestamps.
This is evident when I want to have a function in my page that alerts the user
if there is new content of a specific kind (for example: "1 new articles on
cooking"). That function will report that until the user has reset the
timestamp (by visiting the 'what's new?' page that lists all new articles). I
can't mark just this article as 'read'.

So, what options do I have? Well, each item have an ID, so if I should keep
track of read/unread I should base that on the IDs in the aggregation database.
I think it is better to keep track of what the user *HAS* read.
Why? You don't have to mess with user read lists when a new article
is added, but you DO have to mess with the unread list.
Looking at how newsreaders (specifically those that make use of a .newsrc file)
do it, they keep track of series of ID's, like "12,14-67,69" - which in my case
could mean that the user has read the items with ID 13 and 68.
Newsreaders using .newsrc keep track of what the user HAS read.
The newsrc isn't edited when a new article shows up. The newsrc
approach of using ranges is a compact way to store what has been
read, but a bit awkward to manipulate. Also, if the user interface
has a "catch-up" function (mark everything read), this collapses
down to a single range starting with the lowest possible ID (e.g.
1).

The aggregation database looks something like this:

ID | Kind | Headline | Original ID
------+-------------+-------------------------------+------------
1 | article | Home made pie | 23
2 | article | Hamburgers a'plenty | 24
3 | forum | Anyone likes strawberries? | 298
4 | comments |*Re: Home made pie | 67

Get the idea? The ID is the id in the aggregated database, the kind is from
what original database the content came from and the original ID is the id in
that database
Another approach is to keep a SQL table containing user ID and
article ID (and forum ID, if there's more than one). An entry in
that table means that user has read that article.
So, if I go and read "Hamburgers a'plenty", it should perhaps update my profile
to say "1,3-4" or somesuch to note that I have read id number 2. Or perhaps I
should just keep track of all the IDs I have read? The aggregate database keeps
content around for about a month, which could mean thousands of items.

I am guessing that a MySQL query that looked like this:

"select * from aggregate where id not in(1,2,3,4,5,6,7,8.....1678)"
select aggregate.* from aggregate LEFT JOIN readlist on aggregate.id = readlist.id
and readlist.userid = 'this guys user id' where readlist.id is null;

gets you a list of all articles this guy hasn't read. A problem
with this approach is that readlist grows continuously over time.
So, I am wondering how YOU would have done - Or are you already doing this in
one way or the other? I'm just venting here and hoping that someone will come
with good suggestions on how to solve this in an efficient manner.

The .newsrc approach isn't too bad: it assumes that articles are
created in sequential order, and that getting the high id of current
articles is fairly easy. (select max(id) from aggregate). Many
newsreaders assume that if the article id <= the max and id >= the
min not yet expired and it's not in the newsrc list, there's a
pretty good chance that it actually exists. If it later discovers
that the article does not exist (say, trying to fetch it or its
subject line), it marks it read.

You could try putting the list of ranges into SQL. It saves storage.
Chances are, you'd need to wipe out and re-store the entire list
of ranges for a particular user (for a particular forum, if there's
more than one) every time.

Gordon L. Burditt

Aug 30 '05 #2

Sandman

In article <11*************@corp.supernews.com>,
go***********@burditt.org (Gordon Burditt) wrote:

This is evident when I want to have a function in my page that alerts
the user if there is new content of a specific kind (for example: "1
new articles on cooking"). That function will report that until the
user has reset the timestamp (by visiting the 'what's new?' page that
lists all new articles). I can't mark just this article as 'read'.

So, what options do I have? Well, each item have an ID, so if I
should keep track of read/unread I should base that on the IDs in the
aggregation database.

I think it is better to keep track of what the user *HAS* read. Why?
You don't have to mess with user read lists when a new article is
added, but you DO have to mess with the unread list.

Good point.

Looking at how newsreaders (specifically those that make use of a
.newsrc file do it, they keep track of series of ID's, like
"12,14-67,69" - which in my case could mean that the user has read
the items with ID 13 and 68.

Newsreaders using .newsrc keep track of what the user HAS read.
The newsrc isn't edited when a new article shows up. The newsrc
approach of using ranges is a compact way to store what has been
read, but a bit awkward to manipulate. Also, if the user interface
has a "catch-up" function (mark everything read), this collapses
down to a single range starting with the lowest possible ID (e.g.
1).

Exactly.

The aggregation database looks something like this:

ID | Kind | Headline | Original ID
------+-------------+-------------------------------+------------
1 | article | Home made pie | 23
2 | article | Hamburgers a'plenty | 24
3 | forum | Anyone likes strawberries? | 298
4 | comments |*Re: Home made pie | 67

Get the idea? The ID is the id in the aggregated database, the kind
is from what original database the content came from and the original
ID is the id in that database

Another approach is to keep a SQL table containing user ID and
article ID (and forum ID, if there's more than one). An entry in
that table means that user has read that article.

Yes, but lookup in that table would take time, especially as time goes by and
new articles and new forum posts arrive. The aggregate database are there to
check up only on what's new and contains nothing but the last months fresh
items.

So, if I go and read "Hamburgers a'plenty", it should perhaps update
my profile to say "1,3-4" or somesuch to note that I have read id
number 2. Or perhaps I should just keep track of all the IDs I have
read? The aggregate database keeps content around for about a month,
which could mean thousands of items.

I am guessing that a MySQL query that looked like this:
"select * from aggregate where id not in(1,2,3,4,5,6,7,8.....1678)"

select aggregate.* from aggregate LEFT JOIN readlist on aggregate.id
readlist.id and readlist.userid = 'this guys user id' where
readlist.id is null;

gets you a list of all articles this guy hasn't read. A problem with
this approach is that readlist grows continuously over time.

Exactly.

So, I am wondering how YOU would have done - Or are you already doing
this in one way or the other? I'm just venting here and hoping that
someone will come with good suggestions on how to solve this in an
efficient manner.

The .newsrc approach isn't too bad: it assumes that articles are
created in sequential order, and that getting the high id of current
articles is fairly easy. (select max(id) from aggregate). Many
newsreaders assume that if the article id <= the max and id >= the min
not yet expired and it's not in the newsrc list, there's a pretty good
chance that it actually exists. If it later discovers that the
article does not exist (say, trying to fetch it or its subject line),
it marks it read.

You could try putting the list of ranges into SQL. It saves storage.
Chances are, you'd need to wipe out and re-store the entire list of
ranges for a particular user (for a particular forum, if there's more
than one) every time.

Exactly. I'll probably store the id access line in the user table, from where I
load info about the current user at the beginning of each page load. LIke this:

<?
$q=mysql_query("select * from member where email = '{$_COOKIE['email']}'
and md5pass = '{$_COOKIE['passwd']}'");
$user=mysql_fetch_array($q);
?>

That way, I could have a function that takes the current kind and id and
matches it to the list, something like this:

<?
$user["read"] = "article=12,23-56;forum=23,56-12989"
# $article = array of current article

if (has_read($user["read"], $article["id"], "article")){
# The user has read this
}
?>

But the has_read() function need to be really effective, since when listing 40
articles, it should be called with each article id to check read status.

*ponders*

--
Sandman[.net]

Aug 30 '05 #3

by: Astra | last post by:

Hi All I've noticed on quite a few ASP sites that when they have a 'MyAccount' section they transfer the site to https and then when you have logged into your account successfully and gone back...

ASP / Active Server Pages

Track specific visitors who click on banner

by: | last post by:

(subject included - apologies) <jason@catamaranco.com> wrote in message news:... > Is there a simple way to track users leaving our site to vendors whose wares > we have advertised as a banner...

ASP / Active Server Pages

Keeping Track Of User Navigation For Titles

by: Colin Steadman | last post by:

ASP / Active Server Pages

179

Keeping Web Page at Fixed Width

by: SoloCDM | last post by:

How do I keep my entire web page at a fixed width? ********************************************************************* Signed, SoloCDM

HTML / CSS

Best practice keeping business object collection synced to DB

by: Alfred Taylor | last post by:

I'm testing the waters of n-tier development and I ran into a scenario that I'm not sure what the best solution would be. I have a Company object which contains a collection of contacts retrieved...

C# / C Sharp

Track a session user

by: tshad | last post by:

If I am using FormsAuthentication, is there a way to check who is logged in? I want to be able to check at any particular time, not just how many people are logged in, but who they are. One...

ASP.NET

how do you track visitors? php cookies?

by: johnny | last post by:

hi all! I am starting to study the best way to track site visitors. Logfiles stats which come with every web hosting, have little metrics to be analyzed and also problems with cached pages which...

PHP

SQL 2005: keeping track of database changes

by: metaperl | last post by:

I'm actually taking Microsoft's 2779 and just finished a lab where we kept track of our changes to the database. However, I'm not happy with the scripts interface because it does not tell me the...

Microsoft SQL Server

Keeping track of subclasses and instances?

by: Karlo Lozovina | last post by:

Hi, what's the best way to keep track of user-made subclasses, and instances of those subclasses? I just need a pointer in a right direction... thanks. -- Karlo Lozovina -- Mosor

Python

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

Keeping track of what a user has read on a web site

Similar topics