We have one huge old project with intensive mysql usage.
Due to some architectural errors, (sometimes development people do very strange things…) application continuously writes to several tables, partitioned by date. And they didn`t think about aggregation at all… Database growth very fast + 30 Gb per day. And it becomes real problem when we migrate data to SSD raid 10. (BTW it gives us 50 times performance boost!)
Our Application and reporting service doesn`t need such huge amount of data. Development of this project currently stalled, so only Operation team can do modifications.
I create archive database at slow SAS raid, and create aggregation sql. Main problem was – unknown partition number for data. There are 2 ways how to get it.
Use EXPLAIN PARTITIONS, and parse output, or just calculate number 🙂
How table was partitioned:
For Mysql 5.5
http://dev.mysql.com/doc/refman/5.5/en/partitioning-hash.html
When PARTITION BY HASH is used, MySQL determines which partition of num partitions to use based on the modulus of the result of the user function. In other words, for an expression expr, the partition in which the record is stored is partition number N, where N = MOD(expr, num). Suppose that table t1 is defined as follows, so that it has 4 partitions: CREATE TABLE t1 (col1 INT, col2 CHAR(5), col3 DATE) PARTITION BY HASH( YEAR(col3) ) PARTITIONS 4; If you insert a record into t1 whose col3 value is '2005-09-15', then the partition in which it is stored is determined as follows: MOD(YEAR('2005-09-01'),4) = MOD(2005,4) = 1
So it`s better to calculate partition number.
Daily aggregation sql script (with some extra debug info):
`id` BIGINT(20) NOT NULL,
`create_date` DATE NOT NULL,
`time_interval` INT(11) NOT NULL,
`block_id` INT(11) NOT NULL,
`news_id` INT(11) NOT NULL,
`views_count` INT(11) NOT NULL,
`strategy_id` INT(11) NOT NULL,
`agency_id` INT(11) NOT NULL,
`site_id` INT(11) DEFAULT '0',
`campaign_id` INT(11) DEFAULT '0',
`topic_id` INT(11) DEFAULT '0'
) ENGINE=InnoDB AUTO_INCREMENT=1957948748 DEFAULT CHARSET=utf8 COMMENT='block aggregation temporary table'
SELECT id,create_date,0,block_id,news_id,SUM(views_count) AS views_count,strategy_id,agency_id,site_id,campaign_id,topic_id
FROM block_views_log
WHERE create_date=subdate(CURRENT_DATE, 2)
GROUP BY create_date,block_id,news_id,strategy_id,agency_id,site_id,campaign_id,topic_id
;
INSERT INTO archive.archive_block_views_log SELECT id,create_date,time_interval,block_id,news_id,views_count,strategy_id,agency_id,site_id,campaign_id,topic_id FROM block_views_log_tmp;
DROP TABLE IF EXISTS block_views_log_tmp;
SELECT SUM(views_count) FROM archive.archive_block_views_log WHERE create_date=subdate(CURRENT_DATE, 2);
SELECT SUM(views_count) FROM MYDATABASE.block_views_log WHERE create_date=subdate(CURRENT_DATE, 2);
EXPLAIN partitions SELECT * FROM MYDATABASE.block_views_log WHERE create_date=subdate(CURRENT_DATE, 2) \G;
SET @s=CONCAT('ALTER TABLE MYDATABASE.block_views_log TRUNCATE PARTITION ','p',MOD(TO_DAYS(subdate(CURRENT_DATE, 2)),10));
SELECT @s; PREPARE trstmt FROM @s; EXECUTE trstmt; DEALLOCATE PREPARE trstmt;
Agregation results
42 000 000 records -> 700 000 records
Damn, why they didn`t do aggregation at application…
For now I`m in doubts to use mysql events to run this script, mysql internal scheduler have some disadvantages, or to use cron task.
If I choose mysql events, it will be a topic for next story :).
Sergey Korobanov.
0 Comments.