TY - GEN
T1 - Reading Time Prediction Model on Chinese Technical Documentation
AU - Gao, Zhijun
AU - Li, Fan
AU - Yu, Jingsong
PY - 2020/7
Y1 - 2020/7
N2 - This paper was presented at the Invited Panel session “Technical Communication in China”. There has been various research on the reading time and legibility of online texts with people's tendency to online materials. Text-related attributes like font size or letterspacing are commonly used variables in this field. The objective of this study is to investigate the influential factors on the reading time of Chinese technical documentation, and to build a Decision Tree model to predict its reading time. In the experiment, log data including information of over a million user visits from a cloud service provider's website are collected. User's visit time, stay time, visit step, visit device and many other data fields are recorded in a user session. In addition to user behavioral data from log files, data metrics concerning technical documentation itself are also collected. For all documents used in the experiment, their word counts, image counts, link counts and section counts are scraped using web crawlers. The linear correlation analysis is applied in order to explore the correlations between variables for predictions. The results show that a 75 percent accuracy is achieved using the Decision Tree model.
AB - This paper was presented at the Invited Panel session “Technical Communication in China”. There has been various research on the reading time and legibility of online texts with people's tendency to online materials. Text-related attributes like font size or letterspacing are commonly used variables in this field. The objective of this study is to investigate the influential factors on the reading time of Chinese technical documentation, and to build a Decision Tree model to predict its reading time. In the experiment, log data including information of over a million user visits from a cloud service provider's website are collected. User's visit time, stay time, visit step, visit device and many other data fields are recorded in a user session. In addition to user behavioral data from log files, data metrics concerning technical documentation itself are also collected. For all documents used in the experiment, their word counts, image counts, link counts and section counts are scraped using web crawlers. The linear correlation analysis is applied in order to explore the correlations between variables for predictions. The results show that a 75 percent accuracy is achieved using the Decision Tree model.
KW - Decision tree
KW - Online documentation
KW - Readability
KW - Technical communication
KW - n/a OA procedure
UR - http://www.scopus.com/inward/record.url?scp=85092647874&partnerID=8YFLogxK
U2 - 10.1109/ProComm48883.2020.00046
DO - 10.1109/ProComm48883.2020.00046
M3 - Conference contribution
SN - 978-1-7281-5564-7
T3 - IEEE International Professional Communication Conference (ProComm)
SP - 161
EP - 167
BT - 2020 IEEE International Professional Communication Conference (ProComm)
PB - IEEE
CY - Piscataway, NJ
T2 - IEEE International Professional Communication Conference, ProComm 2020
Y2 - 20 July 2020 through 21 July 2020
ER -