講座編號(hào):jz-yjsb-2023-y016
講座題目:Risk-Sensitive Markov Decision Processes with Long-Run CVaR Criterion
主 講 人:夏俐 教授 中山大學(xué)
講座時(shí)間:2023年12月1日(星期五)下午15:30
講座地點(diǎn):北京工商大學(xué)阜成路校區(qū)東區(qū)科教樓四層會(huì)議室
參加對(duì)象:計(jì)算機(jī)與人工智能學(xué)院信息管理系研究生及本科生
主辦單位:計(jì)算機(jī)與人工智能學(xué)院
主講人簡(jiǎn)介:
夏俐,中山大學(xué)管理學(xué)院教授。分別于2002年和2007年在清華大學(xué)自動(dòng)化系獲得學(xué)士和博士學(xué)位,博士生期間在香港科技大學(xué)聯(lián)合培養(yǎng),博士畢業(yè)后分別在IBM中國(guó)研究院、沙特國(guó)王科技大學(xué)從事科研工作,2011年至2019年在清華大學(xué)自動(dòng)化系任教,歷任講師、副教授(博士生導(dǎo)師),2019年調(diào)入中山大學(xué)。主要研究方向?yàn)轳R氏決策過(guò)程、強(qiáng)化學(xué)習(xí)、排隊(duì)論、博弈論等理論研究,以及在能源、金融等領(lǐng)域的應(yīng)用研究。發(fā)表論文100余篇,獲得美國(guó)專(zhuān)利3項(xiàng)、中國(guó)專(zhuān)利8項(xiàng),主持4項(xiàng)國(guó)家自然科學(xué)基金項(xiàng)目、3項(xiàng)國(guó)家重點(diǎn)研發(fā)計(jì)劃子課題、多項(xiàng)華為公司等合作研發(fā)項(xiàng)目。擔(dān)任IEEE Transactions on Automation Science and Engineering、Discrete Event Dynamic Systems等國(guó)際權(quán)威SCI期刊的副主編(AE)等學(xué)術(shù)兼職。曾獲2021年和2014年教育部高等學(xué)校自然科學(xué)二等獎(jiǎng)等學(xué)術(shù)獎(jiǎng)勵(lì)。
主講內(nèi)容:
CVaR(Conditional Value at Risk) is an important risk measure in finance engineering. Traditional studies on the optimization of CVaR metrics are usually for single-stage problem. When extended to multi-stage scenarios, the CVaR risk function is not additive per stage, which does not fit the standard MDP(Markov decision process) model and the principle of dynamic programming fails. In this talk, we study the MDP optimization problem for long-run CVaR criterion using a new tool called the sensitivity-based optimization. By introducing a pseudo CVaR metric, we convert the original problem as a bilevel MDP problem: the inner is a standard MDP optimizing the pseudo CVaR, the outer is an optimization problem for a single auxiliary variable. We derive a CVaR difference formula which quantifies the difference of long-run CVaR values under any two randomized policies. With this difference formula, we prove the optimality of deterministic policies. We also obtain a so-called Bellman local optimality equation for CVaR, which is a necessary and sufficient condition for local optimal policies and only necessary for global optimal policies. We further develop a policy iteration type algorithm to efficiently optimize CVaR. We prove that the iterative algorithm can converge to local optima in the mixed policy space. Finally, we conduct a numerical experiment about portfolio management to demonstrate the main results. Our work may shed light on dynamically optimizing CVaR from a sensitivity viewpoint.
