Discussion:
limit a table to 10 rows whenever an insert is done
(too old to reply)
Bible Trivia
2010-08-04 01:58:44 UTC
Permalink
Hello-

Im trying to design a database to handle history for
individual users, potentially thousands or more down the
line. i want to limit the number of inserts into a table to
10 total rows, no matter when they insert single elements.

What Im really wanting is a MySQL 'queue' in a way.
When 1 row is inserted, the last row gets discarded.

If this isnt possible currently, does it 'cost' me a lot
to do an INSERT and then a DELETE all at once
rather than running cron jobs at night to do this?

Im concerned that if the DB grows larger and larger,
the cron jobs will take longer and longer and slow down
the entire performance. The DB is a 24 hr a day DB,
so there isnt really a 'good' time to run cron jobs.

Thanks...Im still kinda new to MySQL and DB's
Gordon Burditt
2010-08-04 07:16:49 UTC
Permalink
Post by Bible Trivia
Im trying to design a database to handle history for
individual users, potentially thousands or more down the
line. i want to limit the number of inserts into a table to
10 total rows, no matter when they insert single elements.
Um, you want to handle thousands of users, but you only want 10
rows in the table. Why? What's *in* this table? A history of
what? Or did you mean 10 rows *per user*?

How do you identify which rows to keep and which rows to delete?

I expect no issues with 10 rows in a table, unless it gets
ridiculous with 10k columns in each row.

At what rate to you expect rows to be added to this table when you
have, say, 10,000 users? (5 per hour? 5,000 per hour?) What
problem would you expect if a cron job to clean out the old records
were run only once on Feb. 29?
Post by Bible Trivia
What Im really wanting is a MySQL 'queue' in a way.
When 1 row is inserted, the last row gets discarded.
Does that mean that *missing* processing a row is a *problem*?
Omitting shipping the customer his order is a bit more
of a problem than efficiency.
Post by Bible Trivia
If this isnt possible currently, does it 'cost' me a lot
to do an INSERT and then a DELETE all at once
rather than running cron jobs at night to do this?
Deleting a large number of records may cause the table to be locked
for an unacceptably long time. By that I'm thinking of deleting,
say, 2 million records in a table of 25 million. And that was with
older versions of MySQL and hardware that would be considered slow
today. "unacceptably long" might be holding a table lock so a web
transaction takes more than one minute.
Post by Bible Trivia
Im concerned that if the DB grows larger and larger,
the cron jobs will take longer and longer and slow down
the entire performance. The DB is a 24 hr a day DB,
so there isnt really a 'good' time to run cron jobs.
You can probably determine that there is, in fact, a time when
the load on the database is low (due to, say, a large percentage
of your users being asleep and therefore not using your site),
and you can afford to run cron jobs.
Bible Trivia
2010-08-05 01:58:38 UTC
Permalink
Post by Gordon Burditt
Post by Bible Trivia
Im trying to design a database to handle history for
individual users, potentially thousands or more down the
line.  i want to limit the number of inserts into a table to
10 total rows, no matter when they insert single elements.
Um, you want to handle thousands of users, but you only want 10
rows in the table.  Why?  What's *in* this table?  A history of
what?  Or did you mean 10 rows *per user*?
Sorry, I didnt explain that well. I have a table of 'users'. For
each
user there is a seperate individual table per user that contains their
GPS history. Im working on a GPS application. For users who have
the reduced feature set, there will be less table entries for their
GPS
tracking history. More for the 'advanced' feature set.
Post by Gordon Burditt
How do you identify which rows to keep and which rows to delete?
I want to insert the last GPS coordinate at the end of the table and
remove the first entry (given a 10 entry table, insert at 11, remove
item 1).
Post by Gordon Burditt
I expect no issues with 10 rows in a table, unless it gets
ridiculous with 10k columns in each row.
There arent many columns in the history table, about 7 or so,
maybe a few more over time, but not many.
Post by Gordon Burditt
At what rate to you expect rows to be added to this table when you
have, say, 10,000 users?  (5 per hour?  5,000 per hour?)  What
problem would you expect if a cron job to clean out the old records
were run only once on Feb. 29?
If I have 1000 users updating every 15 minutes (or less) you can see
that the hits to the server/DB will increase over time. Im already
considering
what to do if I need multiple servers and DB replication as well.
Post by Gordon Burditt
Post by Bible Trivia
What Im really wanting is a MySQL 'queue' in a way.
When 1 row is inserted, the last row gets discarded.
Does that mean that *missing* processing a row is a *problem*?
Omitting shipping the customer his order is a bit more
of a problem than efficiency.
If the user misses one its not the end of the world no, just trying
to think through all conditions etc.
Post by Gordon Burditt
Post by Bible Trivia
If this isnt possible currently, does it 'cost' me a lot
to do an INSERT and then a DELETE all at once
rather than running cron jobs at night to do this?
Deleting a large number of records may cause the table to be locked
for an unacceptably long time.  By that I'm thinking of deleting,
say, 2 million records in a table of 25 million.  And that was with
older versions of MySQL and hardware that would be considered slow
today.  "unacceptably long" might be holding a table lock so a web
transaction takes more than one minute.
If I had to lock it up for 2-5 minutes once per night to update
2million records that would be very acceptable, although I wonder what
trade off there is by doing it all in one client connection rather
than
a cron job, but a lesser issue at the moment if that (the time to
update) is the case.
Post by Gordon Burditt
Post by Bible Trivia
Im concerned that if the DB grows larger and larger,
the cron jobs will take longer and longer and slow down
the entire performance.  The DB is a 24 hr a day DB,
so there isnt really a 'good' time to run cron jobs.
You can probably determine that there is, in fact, a time when
the load on the database is low (due to, say, a large percentage
of your users being asleep and therefore not using your site),
and you can afford to run cron jobs.
Depending on how many users spread out over the globe, yes.
At night the application wont send nearly as often, so yes that
would work.

Thanks for your feedback, I appreciate it
Gordon Burditt
2010-08-05 03:41:22 UTC
Permalink
Post by Bible Trivia
Sorry, I didnt explain that well. I have a table of 'users'. For
each
user there is a seperate individual table per user that contains their
GPS history.
One table per user is a potentially serious performance bottleneck,
as that is likely to involve one directory per user at the OS level.
When you get a million users, this can start to hurt. Some OSs
might hard-limit the number of subdirectories to something around
32767 or 65535.

Separating the data into individual tables makes it much harder to
do other stuff like "What users are within 500 feet of this user?",
or "How many of my users are in Europe?" or "How many users have
reported their location in the last hour?".

I recommend using one table for locations and adding a 'user' column
to distinguish them, then putting all the records in one table.
Use an index on user or (user,timestamp).
Post by Bible Trivia
Im working on a GPS application. For users who have
the reduced feature set, there will be less table entries for their
GPS
tracking history. More for the 'advanced' feature set.
Post by Gordon Burditt
How do you identify which rows to keep and which rows to delete?
I want to insert the last GPS coordinate at the end of the table and
remove the first entry (given a 10 entry table, insert at 11, remove
item 1).
From the database point of view, rows are *NOT* inserted in an
order, and they are not numbered in an order (unless you arrange
it yourself with your queries, which gets time-consuming if you modify
all rows every time you add one). Rows are retrieved in an order if
you use ORDER BY. It helps to understand this when attacking the
problem.

Using an application language (like, say, PHP) and assuming you have
a datetime column ts, you can do a query like:
(first insert a new row, if wanted)

SELECT ts FROM locations
WHERE user = '$user'
ORDER BY ts desc
LIMIT 10,1;

This retrieves the 10th row, if there is one, ordering by timestamp,
most-recent first. *IF* a row is retrieved, you can then do:

DELETE FROM locations
WHERE user = '$user' AND ts < '$ts';

which deletes anything older than the 10th record (according to its
timestamp, with the most recent first). If a row is not retrieved,
you don't have 10 rows, so don't delete anything. This happens the
first few times with a new user. The DELETE might not delete
anything (if you just added a 10th record). If you skip these
queries for a while, one run of them will clean up all the old stuff
left behind for a particular user.
Post by Bible Trivia
If I have 1000 users updating every 15 minutes (or less) you can see
that the hits to the server/DB will increase over time. Im already
considering
what to do if I need multiple servers and DB replication as well.
That's a little more than one update a second (with typically 3
queries each time). I don't know what else you are doing, but this
isn't much load.

Careful with replication: you typically have to route all the
*changes* to one master server, and adding slaves won't help. You
haven't indicated what *uses* this history table at all.
Bible Trivia
2010-08-06 00:27:49 UTC
Permalink
Hi Gordon-
Post by Gordon Burditt
I recommend using one table for locations and adding a 'user' column
to distinguish them, then putting all the records in one table.
Use an index on user or (user,timestamp).
I can see where this is good advice. I was assuming (and one shouldnt
assume in many situations in life) that MySQL could handle multiple
millions of entries, so I thought that this wasnt an issue, but I can
see
how it would be. I didnt know how it was implemented under the hood.

So is there a good way to put all the data into one table where
that table contains the history as well? Or are you suggesting
2 tables, one for basic user account info and the second table
contains all the users with their history?

Either way, how do I setup the second table (or even the first)?
Id like either way to be flexible so that User A could have 10 history
elements while User B might have 50, and User C might even have 100.
I would think I could just insert the data into the table as follows:

Columns:
user lat long datetime

where 'user' is the primary key,
then just insert away. When I lookup information just do a
SELECT FROM table WHERE user='userid'. Id imagine MySQL
does a good job of an O(5) or less for lookups, even on a million
records, so the retrieval of history wouldnt take much time?
Post by Gordon Burditt
From the database point of view, rows are *NOT* inserted in an
order, and they are not numbered in an order (unless you arrange
it yourself with your queries, which gets time-consuming if you modify
all rows every time you add one).  Rows are retrieved in an order if
you use ORDER BY.  It helps to understand this when attacking the
problem.
Actually I knew this, i was going to use SORT if needed, however
Im laying them out in a PHP script which overlays a map, so its not
important at this time, but good to know.
Post by Gordon Burditt
That's a little more than one update a second (with typically 3
queries each time).  I don't know what else you are doing, but this
isn't much load.
Im hoping that if the randomness of the GPS updates over time
from all users starts to 'cluster' I can somehow tell the application
to spread them out a bit to spread the load on the server
(eventually).
Probably rev 2 or 3 of the product.
Post by Gordon Burditt
Careful with replication:  you typically have to route all the
*changes* to one master server, and adding slaves won't help.  You
haven't indicated what *uses* this history table at all.
Yes, I was a bit leary about using replication as well, glad to hear
a second opinion on it, Ill avoid it. Per the history, its allowing
users
to see what their history was, or authorized other users to view their
history. It can be downloaded, viewed via google maps, etc.

I appreciate your feedback, thanks!
Gordon Burditt
2010-08-09 03:31:01 UTC
Permalink
Post by Bible Trivia
Post by Gordon Burditt
I recommend using one table for locations and adding a 'user' column
to distinguish them, then putting all the records in one table.
Use an index on user or (user,timestamp).
I can see where this is good advice. I was assuming (and one shouldnt
assume in many situations in life) that MySQL could handle multiple
millions of entries, so I thought that this wasnt an issue, but I can
see
how it would be. I didnt know how it was implemented under the hood.
MySQL can handle multiple millions of rows in a table.
Multiple millions of tables in a database, however, is problematical
performance-wise and it's also generally considered bad database design.
If you're considering:
one table per user,
one table per month,
one table per employee,
one table per sale,
one table per item sold,
or anything else that involves constructing table names out of
concatenated strings
think carefully whether you want to do this.

Exceptions might happen if for security or administrative reasons,
you want data kept strictly separate, and put them in separate
databases (possibly on separate machines). Example: your company
does payroll for hundreds of companies, and you put the data for
each in a separate database. There's no reason to do queries over
multiple company data, and plenty of reason to make sure nobody has
the authority to do it. Putting multiple company data in the same
table can become inconvenient if a company wants all its data erased
*including writing over deleted records repeatedly* and perhaps
physical destruction of the hard disk containing it.
Post by Bible Trivia
So is there a good way to put all the data into one table where
that table contains the history as well? Or are you suggesting
2 tables, one for basic user account info and the second table
contains all the users with their history?
User data probably contains a bunch of stuff inappropriate to
location history, such as password, email address, profile settings,
etc., and I don't think you want to duplicate all that. You also
probably want a table where the user name (string) and/or user id
(numeric) has a unique index to forbid duplications. Don't stuff
*everything* in one table. But don't just divide it by user, either.
A one-to-many relationship (say, a list of favorite locations)
usually requires another table. A many-to-many relationship (e.g.
user "friends" lists to view history of another user) usually
requires two more tables (three total: two entities and one for
the relationship between them).
Post by Bible Trivia
Either way, how do I setup the second table (or even the first)?
Id like either way to be flexible so that User A could have 10 history
elements while User B might have 50, and User C might even have 100.
The query that deletes extra entries for a user takes a number for the
limit. That number can vary by user. You'd probably keep the number
in the user table.
Post by Bible Trivia
user lat long datetime
where 'user' is the primary key,
'user' alone can't be the primary key, since you are inserting
more than one record per user. Perhaps (user, datetime) makes
a good primary key.
Post by Bible Trivia
then just insert away. When I lookup information just do a
SELECT FROM table WHERE user='userid'.
perhaps with ORDER BY `datetime` desc, and maybe a LIMIT clause
if you only want the current location.

Loading...