Issue Description and Solution Approaches : In response to a new request from the Sahamati team, five new columns were added to two tables, and the data processing layer was modified to populate these columns for incoming data. Subsequently, there is a need to update the historical dataset, consisting of approximately 71 lakhs records.
Three approaches were considered:
Approach 1: Traditional DB Update Statement
- Not recommended due to potential performance issues and pressure on the database.
Approach 2: Create a new table, Update historical values, and Swap it.
- Time-consuming and involves downtime during the table renaming process. Live data needs to be updated to the new tables to ensure zero data loss.
Approach 3:
- Perform data transformation/enrichment at Data Delivery Layer
- Create master tables, build a dynamic view by joining existing tables with master tables.
- No downtime required, offers flexibility, and supports future requests for additional columns.
- Leverages the benefits of hyperscale DB by performing data enrichment at the time of data delivery.
Opted Solution: Approach 3
- Chosen for its dynamic nature, efficiency in handling both historical and current datasets.
- Provides an opportunity to leverage hyperscale DB benefits in data enrichment at the time of data delivery.
- Offers flexibility and seamless support for future requests involving additional columns.
|
Solution Steps in Detail
Sahamati New Tables & Views
Tables Schema:
Master AA - To store the organization name for the AA’s
create table master_aa on CLUSTER vusmart ( aaId LowCardinality(String), aa_org_name LowCardinality(String), last_update_date DateTime default now() ) Engine = ReplicatedReplacingMergeTree() ORDER BY (aaId);
|
Master FIP - To store the organization name for the FIP’s and other associated attributes
create table master_fip on CLUSTER vusmart ( fipId LowCardinality(String), fip_org_name LowCardinality(String), regulator LowCardinality(String), license_type LowCardinality(String),
weights Float last_update_date DateTime default now() )Engine = ReplicatedReplacingMergeTree() ORDER BY (fipId);
|
Master Business Hour - To store the business hour flag for each hour in a day
create table master_business_hour on CLUSTER vusmart ( hour UInt8, is_biz_hr LowCardinality(String) ) Engine = ReplicatedReplacingMergeTree() ORDER BY (hour);
|
Views:
Push Metric View: This view query combines the push metric dataset with the master table of AAs, FIPs and business hours to bring a consolidated view. This view should be used in the existing storyboard queries instead of “push metric” table.
CREATE OR REPLACE VIEW push_metric_data_view ON CLUSTER vusmart AS SELECT pm.timestamp AS timestamp , pm.aaId AS aaId , pm.fipId AS fipId , pm.event_name AS event_name , pm.success_percent AS success_percent , pm.timeout_percent AS timeout_percent , pm.acc_notfound_percent AS acc_notfound_percent , pm.server_error_percent AS server_error_percent , pm.client_error_percent AS client_error_percent , pm.latencyP50_ms AS latencyP50_ms , pm.latencyP95_ms AS latencyP95_ms , pm.latencyP99_ms AS latencyP99_ms , pm.latency_avg_ms AS latency_avg_ms , pm.window AS window , aa.aa_org_name AS aa_org_name , fip.fip_org_name AS fip_org_name , fip.regulator AS regulator , fip.license_type AS license_type , bh.is_biz_hr AS business_hour FROM (SELECT *, toHour(toDateTime(timestamp,'Asia/Kolkata')) as hour FROM push_metric_data ) pm JOIN master_business_hour bh ON bh.hour = pm.hour JOIN (SELECT fipId, fip_org_name, regulator, license_type FROM master_fip WHERE (fipId, last_update_date) IN (SELECT fipId, MAX(last_update_date) FROM master_fip GROUP BY fipId) ) fip ON pm.fipId = fip.fipId JOIN (SELECT aaId, aa_org_name FROM master_aa WHERE (aaId, last_update_date) IN (SELECT aaId, MAX(last_update_date) FROM master_aa GROUP BY aaId) ) aa ON pm.aaId = aa.aaId
|
Push Notification View: This view query combines the push notification dataset with the master table of AAs, FIPs and business hours to bring a consolidated view. This view should be used in the existing storyboard queries instead of “push notification” table.
CREATE OR REPLACE VIEW push_notification_metric_data_view ON CLUSTER vusmart AS SELECT pnm.timestamp AS timestamp , pnm.aaId AS aaId , pnm.fipId AS fipId , pnm.notification_event_name AS notification_event_name , pnm.success_response_percent AS success_response_percent , pnm.failure_response_percent AS failure_response_percent , pnm.no_response_percent AS no_response_percent , pnm.latencyP50_ms AS latencyP50_ms , pnm.latencyP95_ms AS latencyP95_ms , pnm.latencyP99_ms AS latencyP99_ms , pnm.latency_avg_ms AS latency_avg_ms , pnm.window AS window , aa.aa_org_name AS aa_org_name , fip.fip_org_name AS fip_org_name , fip.regulator AS regulator , fip.license_type AS license_type , bh.is_biz_hr AS business_hour FROM (SELECT *, toHour(toDateTime(timestamp,'Asia/Kolkata')) AS hour FROM push_notification_metric_data) pnm JOIN master_business_hour bh ON bh.hour = pnm.hour JOIN (SELECT fipId, fip_org_name, regulator, license_type FROM master_fip WHERE (fipId, last_update_date) IN (SELECT fipId, MAX(last_update_date) FROM master_fip GROUP BY fipId)) fip ON pnm.fipId = fip.fipId JOIN (SELECT aaId, aa_org_name FROM master_aa WHERE (aaId, last_update_date) IN (SELECT aaId, MAX(last_update_date) FROM master_aa GROUP BY aaId)) aa ON pnm.aaId = aa.aaId
|
Data Load via CSV command:
kubectl exec -i chi-clickhouse-vusmart-0-0-0 -n vsmaps -- clickhouse-client --user vusmart --password Vunet#7814 -q "insert into {table_name} FORMAT CSVWithNames" < {path_to_csv}/{file_name.csv}
|
Sample CSV:
AA
FIP
Business Hour
|