[WishList] Restructure SQL store for clicktracking

Things you'd like to see in phpList

[WishList] Restructure SQL store for clicktracking

Postby GazOutEast » 6:16am, Thu 07 Oct, 2010

Simply put - the current SQL recording for clicktracking is extremely inefficient and server-query expensive.

On a new install with only one email sent to 4,300 users (and sending not complete yet either) the SQL _linktrack table has 10,398 rows - this is huge, and will rapidly draw the server to a standstill as it explodes over a short series of emails. Additionally, it may the / one of the causes of the massive number of users reporting unacceptable frequency / consistency of Server 500 errors during process_queue triggering - the sheer number of entries to initially write to server bursting server resource allotment.

Currently click tracking uses two tables - _linktrack for the pairing of user to link sent, and _linktrack_userclick for recording links actioned by user.

** The _linktrack table needs broken apart as do the queries creating it - there should be no user association in _linktrack - it should be solely a table for pairing the original URL and the redirect through phplist URL. The original URLs need unique keys and should not be repeated ever (e.g. domain.com/phplist/index.php?p=subscribe1 should only ever appear in this table once - not once for every user on every message and every time a message is saved during editing).
** Another table _linktrack_inmessage needs structured where a relationship between the _linktrack URL pairing is recorded per message created - this is done at message creation save point, not at send point - if a new message has only one link in it, only one row would be added to this table with a unique key, if 10 links are in the message, 10 rows would be added (each with a unique key)

In this way, original URLs re-used in multiple message still only have a single entry in _linktrack and the redirect URL is updated per message in _linktrack_inmessage - for example -

The redirect URL in _linktrack is structured something like - domain.com/[target_ID]
Content being - [target_ID] is the _linktrack unique key
Because the original URL is very likely to have a high repetition rate across time, it does not need one entry for every message for every user - that is grossly inefficient.

In _linktrack_inmessage the target id from _linktrack is a secondary key, the message id should also be a secondary key (taken from the _message table) - the pairing of these two causing a new unique key

Having structured that - a new link tracking option becomes available at an intermediate tracking level
- track total clicks per same URL across all messages
- track total clicks per same URL across a single message to all recipient users
The second option in particular would be exceptionally useful for message copy writers to see how their message structure and content affects rate of subscribe and unsubscribe to their lists. Action on the common (un)subscribe links could be tied directly to each message, and the message scrutinised for what caused that recipient action behaviour.

Then we have to track user receipt and action of individual URLs - this is where having the unique ID from _linktrack_inmessage comes into play.

Either in a table like _user_user_history or in a new table ( _user_user_linkhistory ?) the _inmessage unique ket is recorded in an array as having been sent to them. Action on any link can then also be recorded in a separate column, also in an array - did this user click this link? yes / no - then becomes possible. Multiple actions on a link by the same user can be structured into the array as a (link_id) (action_count) pairing - I realise this may result in large cell-contents for some users (webmasters as well as subscribers) but I'm typing this as I think it, therefore this element may need further thought and ideas.

One option may be to have a secondary script and database, whose sole job is recording the click actions of users, moving the load out of the message process server and database - feedback and thoughts please. Let's be honest, just because it's convention to do so, there is nothing set in concrete that says an open-source web application must be self-contained and use a single database, is there?

Additionally, the structuring of the redirect URL needs to be made more user friendly for customer confidence purposes - this in itself would allow better recording and tracking - just a wildcard thought on this before I rush off to a meeting .... a structure along the lines of - domain.com/[prefix_linktrack_ID]/[_inmessage_ID]/[user_ID] where the following applies
- [prefix_linktrack_ID] - prefix = admin_ID or list_ID and is a routing item, linktrack_ID is the unique key from _linktrack which via the logic code yields the original URL to redirect the subscriber to.
- [_inmessage_ID] is the unique key from _linktrack_inmessage (proposed above)
- [user_ID] is multi-purpose - it could be the direct user_ID from _user_user or it could be a concatenation of several keys - e.g. user+list+admin

Structuring the redirect (published) URL like this would allow code to have a unique table _linktrack_actions (?) where only received clicks are recorded, with all the relevant data, instead of the current system of bloating _linktrack, or the unthought system in the middle of this post using arrays in the user tables. After all - we only need to record successful action in the click tracking - we don't need to be recording non-action, which is what is currently happening by using _linktrack to record every link in every message sent to every user on every list by every admin - that's just insane.

By using a structured redirect URL, which will be more SEO friendly (and re-usable in web pages by subscribers) that has a (semi-)understandable standard URL structuring (as opposed to unicode glyphology and random numbering), user confidence in the link will be increased (and action level increase with it) - which ultimately, is what we all want from the messages we send out. Primarily, message sending will be more reliable due to less expensive server overhead, and future campaign analysis will also be faster, easier, and with more options for custom querying of the database.

I'm posting this here (rather than in Mantis) because it needs discussion by more database-savvy developers than me. It also needs input from marketeers (the people using phpList on the front line) as well as site managers.

Gaz
If it's fixed, I can break it
GazOutEast
PL Geek
 
Posts: 61
Joined: 4:47am, Thu 08 Apr, 2010

Re: [WishList] Restructure SQL store for clicktracking

Postby GazOutEast » 6:28am, Thu 07 Oct, 2010

Just an updated thought - the current bloat in table _linktrack is caused by giving every user a unique randomised code for the redirect link (or so it seems) - again, this is crazy and wasteful of server resources, and it is completely unnecessary - see the structured link system above.

This is an URGENT reason to restructure the entire linktracking module in my opinion - the more I dig into the 500 server error and script hanging at process_queue inception and during processing, the more I am convinced the database flooding caused by unique link codes for each link for each user for each message, is the cause of the mod_fcgid collapsing in exhaustion.
If it's fixed, I can break it
GazOutEast
PL Geek
 
Posts: 61
Joined: 4:47am, Thu 08 Apr, 2010

Re: [WishList] Restructure SQL store for clicktracking

Postby balidave » 5:08am, Tue 06 Mar, 2012

I second this restructuring and am sad to see no one's responded to this (or has it been resolved??).

I just had to reboot my whole server due to problems from this table (_linktrack) -- it had 3.5 milion records in it after only a few emails and was over a GB in size! I have 50,000 users but with click-tracking on and having 15-20 links in each email, it quickly got out of hand. I didn't realize the cause tho I did turn it off. The table got zapped and repair wouldn't work tho it wreaked havoc on my server (load and filling up /tmp). Email for all domains on the server came to a standstill and server load went crazy, and some websites looked pretty bizarre. After rebooting I had to empty this table since it was in an error state and was non-functional.

So I hope someone out there is paying attention -- at least put BIG WARNINGS about using click-tracking, tho a restructuring of the tables as GasOutEast suggests seems in order.

BTW I'm a long time PHPList user (~10 years) and BIG fan of it -- it's a marvelous creation but needs some attention in this area.
balidave
phpLister
 
Posts: 12
Joined: 7:14am, Wed 19 Jul, 2006


Return to Feature Requests, Wish-list

Who is online

Users browsing this forum: No registered users and 1 guest

cron