[WishList] Restructure SQL store for clicktracking

Simply put - the current SQL recording for clicktracking is extremely inefficient and server-query expensive.
On a new install with only one email sent to 4,300 users (and sending not complete yet either) the SQL _linktrack table has 10,398 rows - this is huge, and will rapidly draw the server to a standstill as it explodes over a short series of emails. Additionally, it may the / one of the causes of the massive number of users reporting unacceptable frequency / consistency of Server 500 errors during process_queue triggering - the sheer number of entries to initially write to server bursting server resource allotment.
Currently click tracking uses two tables - _linktrack for the pairing of user to link sent, and _linktrack_userclick for recording links actioned by user.
** The _linktrack table needs broken apart as do the queries creating it - there should be no user association in _linktrack - it should be solely a table for pairing the original URL and the redirect through phplist URL. The original URLs need unique keys and should not be repeated ever (e.g. domain.com/phplist/index.php?p=subscribe1 should only ever appear in this table once - not once for every user on every message and every time a message is saved during editing).
** Another table _linktrack_inmessage needs structured where a relationship between the _linktrack URL pairing is recorded per message created - this is done at message creation save point, not at send point - if a new message has only one link in it, only one row would be added to this table with a unique key, if 10 links are in the message, 10 rows would be added (each with a unique key)
In this way, original URLs re-used in multiple message still only have a single entry in _linktrack and the redirect URL is updated per message in _linktrack_inmessage - for example -
The redirect URL in _linktrack is structured something like - domain.com/[target_ID]
Content being - [target_ID] is the _linktrack unique key
Because the original URL is very likely to have a high repetition rate across time, it does not need one entry for every message for every user - that is grossly inefficient.
In _linktrack_inmessage the target id from _linktrack is a secondary key, the message id should also be a secondary key (taken from the _message table) - the pairing of these two causing a new unique key
Having structured that - a new link tracking option becomes available at an intermediate tracking level
- track total clicks per same URL across all messages
- track total clicks per same URL across a single message to all recipient users
The second option in particular would be exceptionally useful for message copy writers to see how their message structure and content affects rate of subscribe and unsubscribe to their lists. Action on the common (un)subscribe links could be tied directly to each message, and the message scrutinised for what caused that recipient action behaviour.
Then we have to track user receipt and action of individual URLs - this is where having the unique ID from _linktrack_inmessage comes into play.
Either in a table like _user_user_history or in a new table ( _user_user_linkhistory ?) the _inmessage unique ket is recorded in an array as having been sent to them. Action on any link can then also be recorded in a separate column, also in an array - did this user click this link? yes / no - then becomes possible. Multiple actions on a link by the same user can be structured into the array as a (link_id) (action_count) pairing - I realise this may result in large cell-contents for some users (webmasters as well as subscribers) but I'm typing this as I think it, therefore this element may need further thought and ideas.
One option may be to have a secondary script and database, whose sole job is recording the click actions of users, moving the load out of the message process server and database - feedback and thoughts please. Let's be honest, just because it's convention to do so, there is nothing set in concrete that says an open-source web application must be self-contained and use a single database, is there?
Additionally, the structuring of the redirect URL needs to be made more user friendly for customer confidence purposes - this in itself would allow better recording and tracking - just a wildcard thought on this before I rush off to a meeting .... a structure along the lines of - domain.com/[prefix_linktrack_ID]/[_inmessage_ID]/[user_ID] where the following applies
- [prefix_linktrack_ID] - prefix = admin_ID or list_ID and is a routing item, linktrack_ID is the unique key from _linktrack which via the logic code yields the original URL to redirect the subscriber to.
- [_inmessage_ID] is the unique key from _linktrack_inmessage (proposed above)
- [user_ID] is multi-purpose - it could be the direct user_ID from _user_user or it could be a concatenation of several keys - e.g. user+list+admin
Structuring the redirect (published) URL like this would allow code to have a unique table _linktrack_actions (?) where only received clicks are recorded, with all the relevant data, instead of the current system of bloating _linktrack, or the unthought system in the middle of this post using arrays in the user tables. After all - we only need to record successful action in the click tracking - we don't need to be recording non-action, which is what is currently happening by using _linktrack to record every link in every message sent to every user on every list by every admin - that's just insane.
By using a structured redirect URL, which will be more SEO friendly (and re-usable in web pages by subscribers) that has a (semi-)understandable standard URL structuring (as opposed to unicode glyphology and random numbering), user confidence in the link will be increased (and action level increase with it) - which ultimately, is what we all want from the messages we send out. Primarily, message sending will be more reliable due to less expensive server overhead, and future campaign analysis will also be faster, easier, and with more options for custom querying of the database.
I'm posting this here (rather than in Mantis) because it needs discussion by more database-savvy developers than me. It also needs input from marketeers (the people using phpList on the front line) as well as site managers.
Gaz
On a new install with only one email sent to 4,300 users (and sending not complete yet either) the SQL _linktrack table has 10,398 rows - this is huge, and will rapidly draw the server to a standstill as it explodes over a short series of emails. Additionally, it may the / one of the causes of the massive number of users reporting unacceptable frequency / consistency of Server 500 errors during process_queue triggering - the sheer number of entries to initially write to server bursting server resource allotment.
Currently click tracking uses two tables - _linktrack for the pairing of user to link sent, and _linktrack_userclick for recording links actioned by user.
** The _linktrack table needs broken apart as do the queries creating it - there should be no user association in _linktrack - it should be solely a table for pairing the original URL and the redirect through phplist URL. The original URLs need unique keys and should not be repeated ever (e.g. domain.com/phplist/index.php?p=subscribe1 should only ever appear in this table once - not once for every user on every message and every time a message is saved during editing).
** Another table _linktrack_inmessage needs structured where a relationship between the _linktrack URL pairing is recorded per message created - this is done at message creation save point, not at send point - if a new message has only one link in it, only one row would be added to this table with a unique key, if 10 links are in the message, 10 rows would be added (each with a unique key)
In this way, original URLs re-used in multiple message still only have a single entry in _linktrack and the redirect URL is updated per message in _linktrack_inmessage - for example -
The redirect URL in _linktrack is structured something like - domain.com/[target_ID]
Content being - [target_ID] is the _linktrack unique key
Because the original URL is very likely to have a high repetition rate across time, it does not need one entry for every message for every user - that is grossly inefficient.
In _linktrack_inmessage the target id from _linktrack is a secondary key, the message id should also be a secondary key (taken from the _message table) - the pairing of these two causing a new unique key
Having structured that - a new link tracking option becomes available at an intermediate tracking level
- track total clicks per same URL across all messages
- track total clicks per same URL across a single message to all recipient users
The second option in particular would be exceptionally useful for message copy writers to see how their message structure and content affects rate of subscribe and unsubscribe to their lists. Action on the common (un)subscribe links could be tied directly to each message, and the message scrutinised for what caused that recipient action behaviour.
Then we have to track user receipt and action of individual URLs - this is where having the unique ID from _linktrack_inmessage comes into play.
Either in a table like _user_user_history or in a new table ( _user_user_linkhistory ?) the _inmessage unique ket is recorded in an array as having been sent to them. Action on any link can then also be recorded in a separate column, also in an array - did this user click this link? yes / no - then becomes possible. Multiple actions on a link by the same user can be structured into the array as a (link_id) (action_count) pairing - I realise this may result in large cell-contents for some users (webmasters as well as subscribers) but I'm typing this as I think it, therefore this element may need further thought and ideas.
One option may be to have a secondary script and database, whose sole job is recording the click actions of users, moving the load out of the message process server and database - feedback and thoughts please. Let's be honest, just because it's convention to do so, there is nothing set in concrete that says an open-source web application must be self-contained and use a single database, is there?
Additionally, the structuring of the redirect URL needs to be made more user friendly for customer confidence purposes - this in itself would allow better recording and tracking - just a wildcard thought on this before I rush off to a meeting .... a structure along the lines of - domain.com/[prefix_linktrack_ID]/[_inmessage_ID]/[user_ID] where the following applies
- [prefix_linktrack_ID] - prefix = admin_ID or list_ID and is a routing item, linktrack_ID is the unique key from _linktrack which via the logic code yields the original URL to redirect the subscriber to.
- [_inmessage_ID] is the unique key from _linktrack_inmessage (proposed above)
- [user_ID] is multi-purpose - it could be the direct user_ID from _user_user or it could be a concatenation of several keys - e.g. user+list+admin
Structuring the redirect (published) URL like this would allow code to have a unique table _linktrack_actions (?) where only received clicks are recorded, with all the relevant data, instead of the current system of bloating _linktrack, or the unthought system in the middle of this post using arrays in the user tables. After all - we only need to record successful action in the click tracking - we don't need to be recording non-action, which is what is currently happening by using _linktrack to record every link in every message sent to every user on every list by every admin - that's just insane.
By using a structured redirect URL, which will be more SEO friendly (and re-usable in web pages by subscribers) that has a (semi-)understandable standard URL structuring (as opposed to unicode glyphology and random numbering), user confidence in the link will be increased (and action level increase with it) - which ultimately, is what we all want from the messages we send out. Primarily, message sending will be more reliable due to less expensive server overhead, and future campaign analysis will also be faster, easier, and with more options for custom querying of the database.
I'm posting this here (rather than in Mantis) because it needs discussion by more database-savvy developers than me. It also needs input from marketeers (the people using phpList on the front line) as well as site managers.
Gaz