此文獻給正在大學迷茫或者即将進入大學計算機專業,将來想要從事大數據行業的學弟學妹們。學長給你們的建議在大學期間踏踏實實學習知識,不要好高骛遠;多參加體育鍛煉。做好這兩點便可。切記不要往多領域發展,最終你會發現:仿佛這你也會那你也會,其實啥也不精還容易鬧笑話。
1 遊戲行業數據分析流程以及分析指标1.1 數據來源過濾清洗數據Nginx請求日志,ETL工作:
遊戲打點記錄業務邏輯:
批處理 實時處理統計每個用戶肖像:
假設下面是一部分nginx請求日志經過數據清洗後的數據。固定格式為:(用戶ID IP 日期 請求URI 請求地址 請求狀态 請求Agent)
e4ec9bb6f2703eb7 180.21.76.203 2020-06-30T09:11:14 00:00 /u3d/v2/appconfig 127.0.0.1:8080 200 "BestHTTP"
1f85152896978527 171.43.190.8 2020-06-30T09:11:14 00:00 /u3d/v2/userAction 127.0.0.1:8080 200 "BestHTTP"
要求:
假設下面是一部分用戶登錄狀态的日志。固定格式為:(用戶ID IP 請求狀态 時間)
e4ec9bb6f2703eb7 180.21.76.203 success 1558430815
1f85152896978527 171.43.190.8 fail 1558430826
要求:
假設有兩張表new_users每天大約6M和play_stages表每天大約10G數據。兩張表都包含以下字段:UserID 用戶ID ,appName遊戲名稱, appVersion 遊戲版本, appPlatform 安卓或IOS
要求: 統計新增用戶留存1-7,15,30,90這10天的留存率;
3.4 Redis編程假設一個遊戲有2000W用戶,每天DAU大約150W左右,現在要求根據關卡值做一個遊戲排行榜 你會如何設計?
4 大數據企業級架構設計4.1 架構設計
架構設計圖.png
4.2 數據收集客戶端發送日志到接口,将數據發送到kafka消息中間件, flume将kafka作為source寫入到亞馬遜s3。
4.2.1 創建Kafka的Topic
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 2 --topic topic-s3-diamond
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 2 --topic topic-s3-ads
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 2 --topic topic-s3-launch
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 2 --topic topic-s3-stage
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 2 --topic topic-s3-gift
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 2 --topic topic-s3-shop
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 2 --topic topic-s3-prop
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 2 --topic topic-s3-ball
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 2 --topic topic-s3-airdrop
a1.sources=r1
a1.channels=c1
a1.sinks=k1
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = com.pusidun.applogs.flume.interceptor.S3CollInterceptor$Builder
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 5000
a1.sources.r1.batchDurationMillis = 2000
a1.sources.r1.kafka.bootstrap.servers = localhost:9092
a1.sources.r1.kafka.zookeeperConnect = localhost:2181
a1.sources.r1.kafka.topics.regex = ^topic-s3-.*$
a1.channels.c1.type=memory
a1.channels.c1.capacity=100000
a1.channels.c1.transactionCapacity=10000
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = s3a://bricks-playable/logs/%{logType}/%Y%m/%d
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.rollInterval = 600
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.fileType = DataStream
a1.sources.r1.channels = c1
a1.sinks.k1.channel= c1
nohup bin/flume-ng agent \
-c conf \
-n a1 \
-f conf/s3.conf & \
-Dflume.root.logger=DEBUG,console &
s3.png
4.2.5 150w日活每天産生數據大小
data.png
4.3 離線數據分析Hive On Spark進行離線數據分析。
4.3.1 Hive表的創建
# 創建Hive外部表
# s3_stage | s3_launch | s3_ads | s3_diamond | s3_diamondShop | s3_gift | s3_airdrop | s3_prop | s3_ball|s3_shopWindow
CREATE EXTERNAL TABLE 表名(
uid STRING,
appVersion STRING,
appName STRING,
appPlatform STRING,
ip STRING,
countryCode STRING,
systimestamp BIGINT,
currentTime BIGINT,
clientTimeStamp STRING,
groupId STRING,
kindType STRING,
params Map<STRING,STRING>
)PARTITIONED BY
(ym string, day string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE;
#添加編寫的jar
ADD JAR /opt/apache/hive-3.1.2/lib/app-logs-hive-udf.jar
#注冊UDF自定義函數 天周月起始時間
CREATE FUNCTION getdaybegin AS 'com.pusidun.applogs.udf.hive.DayBeginUDF';
CREATE FUNCTION getweekbegin AS 'com.pusidun.applogs.udf.hive.WeekBeginUDF';
CREATE FUNCTION getmonthbegin AS 'com.pusidun.applogs.udf.hive.MonthBeginUDF';
CREATE FUNCTION formattime AS 'com.pusidun.applogs.udf.hive.FormatTimeUDF';
vim .exportData.sql
ALTER TABLE s3_stage ADD PARTITION(ym='${ym}',day='${day}') LOCATION 's3a://bricks-playable/logs/stage/${ym}/${day}/';
ALTER TABLE s3_launch ADD PARTITION(ym='${ym}',day='${day}') LOCATION 's3a://bricks-playable/logs/launch/${ym}/${day}/';
ALTER TABLE s3_ads ADD PARTITION(ym='${ym}',day='${day}') LOCATION 's3a://bricks-playable/logs/ads/${ym}/${day}/';
ALTER TABLE s3_diamond ADD PARTITION(ym='${ym}',day='${day}') LOCATION 's3a://bricks-playable/logs/diamond/${ym}/${day}/';
ALTER TABLE s3_gift ADD PARTITION(ym='${ym}',day='${day}') LOCATION 's3a://bricks-playable/logs/gift/${ym}/${day}/';
ALTER TABLE s3_airdrop ADD PARTITION(ym='${ym}',day='${day}') LOCATION 's3a://bricks-playable/logs/airdrop/${ym}/${day}/';
ALTER TABLE s3_prop ADD PARTITION(ym='${ym}',day='${day}') LOCATION 's3a://bricks-playable/logs/prop/${ym}/${day}/';
ALTER TABLE s3_ball ADD PARTITION(ym='${ym}',day='${day}') LOCATION 's3a://bricks-playable/logs/ball/${ym}/${day}/';
vim hive-exec.sh
#!/bin/bash
systime=`date -d "1 day ago" %Y%m-%d`
ym=`echo ${systime} | awk -F '-' '{print $1}'`
day=`echo ${systime} | awk -F '-' '{print $2}'`
cp /opt/s3/.exportData.sql /opt/s3/exportData.sql
sed -i 's/${ym}/'${ym}'/g' /opt/s3/exportData.sql
sed -i 's/${day}/'${day}'/g' /opt/s3/exportData.sql
zeppelin.png
4.3.6 Spark作業日志
spark-history-jobs.png
4.4 實時數據分析Fink消費kafka數據統計每1小時内購總額并寫入ES、每小時url請求Top10、每小時日活人數。
4.4.1 Flink的WebUI
flink-job.png
4.4.2 Flink統計每小時内購總數并寫到ES
es.png
4.5 大數據集群監測
cloudera-manager.png
,更多精彩资讯请关注tft每日頭條,我们将持续为您更新最新资讯!