Alignment Newsletter in Chinese

齐智通讯

This post gives a Chinese version of AN by Rohin Shah, Richard Ngo, Dan Hendrycks, and Cody Wild. The Chinese version will be released two or three days after the English version. All suggestions are welcomed. You can comment on these docs.

这是 Rohin Shah，Richard Ngo，Dan Hendrycks 和 Cody Wild 等人出品的人工智能对齐通讯的中文版合集，动态更新，一般在英文版出来后两到三天内更新中文版。留有评论的权限，希望大家给出宝贵意见和建议。你可以对文档进行评论。

齐智通讯第 173 期来自DeepMind的语言模型结果 (July 20, 2022)
齐智通讯第 172 期中断致歉 (July 4th, 2022)
齐智通讯第 171 期人工智能对齐乐观者与悲观者之间的分歧(January 21st, 2022)
齐智通讯第 170 期通过学习一个函数相似性模型创建用于多任务的可泛化奖励函数 (December 8th, 2021)
齐智通讯第 169 期通过学习一个函数相似性模型创建用于多任务的可泛化奖励函数 (November 24th, 2021)
齐智通讯第 168 期 Open Phil 正在征求资助提案的四个技术主题 (October 28th, 2021)
齐智通讯第 167 期具体机器学习安全问题及其与生存性风险的相关性 (October 20th, 2021)
齐智通讯第 166 期声称我们正处于最重要的世纪是不是很疯狂？ (October 8th, 2021)
齐智通讯第 165 期大模型何时更可能撒谎 (September 22nd, 2021)
齐智通讯第 164 期语言模型写代码的能力如何？(September 15th, 2021)
齐智通讯第 163 期使用有限因子分解集进行因果和时间推断 (September 8th, 2021)
齐智通讯第 162 期基础模型：人工智能的范式转变 (August 27th, 2021)
齐智通讯第 161 期通过学习一个函数相似性模型创建用于多任务的可泛化奖励函数 (August 20th, 2021)
齐智通讯第 160 期构建像人类一样学习和思考的人工智能 (August 13th, 2021)
齐智通讯第 159 期构建知道通过过程化生成游戏的方式如何进行实验的智能体 (August 4th, 2021)
齐智通讯第 158 期我们是否应该对泛化乐观？ (July 29th, 2021)
齐智通讯第 157 期度量Copilot底层技术的未对齐性 (July 24th, 2021)
齐智通讯第 156 期标度假设：构建通用人工智能的计划 (July 16th, 2021)
齐智通讯第 155 期无奖励函数学习算法的 Minecraft 基准 (July 8th, 2021)
齐智通讯第 154 期经济增长理论对变革性人工智能有哪些看法 (June 30th, 2021)
齐智通讯第 153 期展示目标函数健壮性的失败案例的实验 (June 26th, 2021)
齐智通讯第 152 期合作型人工智能研究的子类型 (June 16th, 2021)
齐智通讯第 151 期最后一层的稀疏性是如何让神经网络变得可以被debug的 (May 19th, 2021)
齐智通讯第 150 期合作型人工智能研究的子类型 (May 12th, 2021)
齐智通讯第 149 期齐智通讯的编辑策略 (May 5th, 2021)
齐智通讯第 148 期使用除了准确率和损失值之外的维度来分析泛化 (April 28th, 2021)
齐智通讯第 147 期解释性的概览综述 (April 21st, 2021)
AN #146 (Chinese): Plausible stories of how we might fail to avert an existential catastrophe (April 14th, 2021)
AN #145 (Chinese): Our three year anniversary! (April 7th, 2021)
AN #144 (Chinese): How language models can also be finetuned for non-language tasks (April 2nd, 2021)
AN #143 (Chinese): How to make embedded agents that reason probabilistically about their environments (March 24th, 2021)
AN #142 (Chinese): The quest to understand a network well enough to reimplement it by hand (March 17th, 2021)
AN #141 (Chinese): The case for practicing alignment work on GPT-3 and other large models (March 10th, 2021)
AN #140 (Chinese): Theoretical models that predict scaling laws (March 4th, 2021)
AN #139 (Chinese): How the simplicity of reality explains the success of neural nets (February 24th, 2021)
AN #138 (Chinese): Why AI governance should find problems rather than just solving them (February 17th, 2021)
AN #137 (Chinese): Quantifying the benefits of pretraining on downstream task performance (February 10th, 2021)
AN #136 (Chinese): How well will GPT-N perform on downstream tasks? (February 3rd, 2021)
AN #135 (Chinese): Five properties of goal-directed systems (January 27th, 2021)
AN #134 (Chinese): Underspecification as a cause of fragility to distribution shift (January 21st, 2021)
AN #133 (Chinese): Building machines that can cooperate (with humans, institutions, or other machines) (January 13th, 2021)
AN #132 (Chinese): Complex and subtly incorrect arguments as an obstacle to debate (January 6th, 2021)
AN #131 (Chinese): Formalizing the argument of ignored attributes in a utility function (December 31st, 2020)
AN #130 (Chinese): A new AI x-risk podcast, and reviews of the field (December 24th, 2020)
AN #129 (Chinese): Explaining double descent by measuring bias and variance (December 16th, 2020)
AN #128 (Chinese): Prioritizing research on AI existential safety based on its application to governance demands (December 9th, 2020)
AN #127 (Chinese): Rethinking agency: Cartesian frames as a formalization of ways to carve up the world into an agent and its environment (December 2nd, 2020)
AN #126 (Chinese): Avoiding wireheading by decoupling action feedback from action effects (November 26th, 2020)
AN #125 (Chinese): Neural network scaling laws across multiple modalities (November 11th, 2020)
AN #124 (Chinese): Provably safe exploration through shielding (November 4th, 2020)
AN #123 (Chinese): Inferring what is valuable in order to align recommender systems (October 28th, 2020)
AN #122 (Chinese): Arguing for AGI-driven existential risk from first principles (October 21st, 2020)
AN #121 (Chinese): Forecasting transformative AI timelines using biological anchors (October 14th, 2020)
AN #120 (Chinese): Tracing the intellectual roots of AI and AI alignment (October 7th, 2020)
AN #119 (Chinese): AI safety when agents are shaped by environments, not rewards (September 30th, 2020)
AN #118 (Chinese): Risks, solutions, and prioritization in a world with many AI systems (September 23rd, 2020)
AN #117 (Chinese): How neural nets would fare under the TEVV framework (September 16th, 2020)
AN #116 (Chinese): How to make explanations of neurons compositional (September 9th, 2020)
AN #115 (Chinese): AI safety research problems in the AI-GA framework (September 2nd, 2020)
AN #114 (Chinese): Theory-inspired safety solutions for powerful Bayesian RL agents (August 26th, 2020)
AN #113 (Chinese): Checking the ethical intuitions of large language models (August 19th, 2020)
AN #112 (Chinese): Engineering a Safer World (August 13th, 2020)
AN #111 (Chinese): The Circuits hypotheses for deep learning (August 5th, 2020)
AN #110 (Chinese): Learning features from human feedback to enable reward learning (July 29th, 2020)
AN #109 (Chinese): Teaching neural nets to generalize the way humans would (July 22nd, 2020)
AN #108 (Chinese): Why we should scrutinize arguments for AI risk (July 15th, 2020)
AN #107 (Chinese): The convergent instrumental subgoals of goal-directed agents (July 9th, 2020)
AN #106 (Chinese): Evaluating generalization ability of learned reward models (July 1st, 2020)
AN #105 (Chinese): The economic trajectory of humanity, and what we might mean by optimization (June 24th, 2020)
AN #104 (Chinese): The perils of inaccessible information, and what we can learn about AI alignment from COVID (June 18th, 2020)
AN #103 (Chinese): ARCHES: an agenda for existential safety, and combining natural language with deep RL (June 10th, 2020)
AN #102 (Chinese): Meta learning by GPT-3, and a list of full proposals for AI alignment (June 3rd, 2020)
AN # 101 (Chinese): Why we should rigorously measure and forecast AI progress (May 27th, 2020)
AN #100 (Chinese): What might go wrong if you learn a reward function while acting (May 20th, 2020)
AN #99 (Chinese): Doubling times for the efficiency of AI algorithms (May 13th, 2020)
AN #98 (Chinese): Understanding neural net training by seeing which gradients were helpful (May 6th, 2020)
AN #97 (Chinese): Are there historical examples of large, robust discontinuities? (April 29th, 2020)
AN #96 (Chinese): Buck and I discuss/argue about AI Alignment (April 22nd, 2020)
AN #95 (Chinese): A framework for thinking about how to make AI go well (April 15th, 2020)
AN #94 (Chinese): AI alignment as translation between humans and machines (April 8th, 2020)
AN #93 (Chinese): The Precipice we’re standing at, and how we can back away from it (April 1st, 2020)
AN #92 (Chinese): Learning good representations with contrastive predictive coding (March 25th, 2020)
AN #91 (Chinese): Concepts, implementations, problems, and a benchmark for impact measurement (March 18th, 2020)
AN #90 (Chinese): How search landscapes can contain self-reinforcing feedback loops (March 11th, 2020)
AN #89 (Chinese): A unifying formalism for preference learning algorithms (March 4th, 2020)
AN #88 (Chinese): How the principal-agent literature relates to AI risk (February 27th, 2020)
AN #87 (Chinese): What might happen as deep learning scales even further? (February 19th, 2020)
AN #86 (Chinese): Improving debate and factored cognition through human experiments (February 12th, 2020)
AN #85 (Chinese): The normative questions we should be asking for AI alignment, and a surprisingly good chatbot (February 5th, 2020)
AN #84 (Chinese): Reviewing AI alignment work in 2018-19 (January 29th, 2020)
AN #83 (Chinese): Sample efficient deep learning with ReMixMatch (January 22nd, 2020)
AN #82 (Chinese): How OpenAI Five distributed their training computation (January 15th, 2020)
AN #81 (Chinese): Universality as a potential solution to conceptual difficulties in intent alignment (January 8th, 2020)
AN #80 (Chinese): Why AI risk might be solved without additional intervention from longtermists (January 2nd, 2020)
AN #79 (Chinese): Recursive reward modeling as an alignment technique integrated with deep RL (January 1st, 2020)
AN #78 (Chinese): Formalizing power and instrumental convergence, and the end-of-year AI safety charity comparison (December 25th, 2019)
AN #77 (Chinese): Double descent: a unification of statistical theory and modern ML practice (December 18th, 2019)
AN #76 (Chinese): How dataset size affects robustness, and benchmarking safe exploration by measuring constraint violations (December 4th, 2019)
AN #75 (Chinese): Solving Atari and Go with learned game models, and thoughts from a MIRI employee (November 27th, 2019)
AN #74 (Chinese): Separating beneficial AI into competence, alignment, and coping with impacts (November 20th, 2019)
AN #73 (Chinese): Detecting catastrophic failures by learning how agents tend to break (November 13th, 2019)
AN #72 (Chinese): Alignment, robustness, methodology, and system building as research priorities for AI safety (November 6th, 2019)
AN #71 (Chinese): Avoiding reward tampering through current-RF optimization (October 30th, 2019)
AN #70 (Chinese): Agents that help humans who are still learning about their own preferences (October 23rd, 2019)
AN #69 (Chinese): Stuart Russell’s new book on why we need to replace the standard model of AI (October 18th, 2019)
AN #68 (Chinese): The attainable utility theory of impact (October 14th, 2019)
AN #67 (Chinese): Creating environments in which to study inner alignment failures (October 7th, 2019)
AN #66 (Chinese): Decomposing robustness into capability robustness and alignment robustness (September 30th, 2019)
AN #65 (Chinese): Learning useful skills by watching humans “play” (September 23rd, 2019)
AN #64 (Chinese): Using Deep RL and Reward Uncertainty to Incentivize Preference Learning (September 16th, 2019)
AN #63 (Chinese): How architecture search, meta learning, and environment design could lead to general intelligence (September 10th, 2019)
AN #62 (Chinese): Are adversarial examples caused by real but imperceptible features? (August 22nd, 2019)
AN #61 (Chinese): AI policy and governance, from two people in the field (August 5th, 2019)
AN #60 (Chinese): A new AI challenge: Minecraft agents that assist human players in creative mode (July 22nd, 2019)
AN #59 (Chinese): How arguments for AI risk have changed over time (July 8th, 2019)
AN #58 (Chinese): Mesa optimization: what it is, and why we should care (June 24th, 2019)
AN #57 (Chinese): Why we should focus on robustness in AI safety, and the analogous problems in programming (June 5th, 2019)
AN #56 (Chinese): Should ML researchers stop running experiments before making hypotheses? (May 20th, 2019)
AN #55 (Chinese): Regulatory markets and international standards as a means of ensuring beneficial AI (May 4th, 2019)
AN #54 (Chinese): Boxing a finite-horizon AI system to keep it unambitious (April 27th, 2019)
AN #53 (Chinese): Newsletter turns one year old, and why overfitting isn’t a huge problem for neural nets (April 18th, 2019)
AN #52 (Chinese): Why we may not want our AI systems to model humans (April 5th, 2019)
AN #51 (Chinese): Cancelling within-batch generalization in order to get stable deep RL (April 2nd, 2019)
AN #50 (Chinese): How an AI catastrophe could occur, and an overview of AI policy from OpenAI researchers (March 28th, 2019)
AN #49 (Chinese): Understanding how image classifiers work, and a major increase in adversarial robustness (March 19th, 2019)
AN #48 (Chinese): Quantilization: bounding worst case unintended consequences by partially imitating humans (March 11th, 2019)
AN #47 (Chinese): Why AI safety needs social scientists (March 3rd, 2019)
AN #46 (Chinese): Yet another wall of text about GPT-2, and structural risks from AI (February 21st, 2019)
AN #45 (Chinese): How to extract human preferences from the state of the world (February 13th, 2019)
AN #44 (Chinese): Random search vs. gradient descent on Goodharting, and attention is not all you need; recurrence helps too (February 6th, 2019)
AN #43 (Chinese): The techniques behind AlphaStar, and the many arguments for AI safety (January 29th, 2019)
AN #42 (Chinese): Cooperative IRL as a definition of human-AI group rationality, and an empirical evaluation of theory of mind vs. model learning in HRI (January 21st, 2019)
AN #41: Building AI systems that require informed consent (January 17th, 2019)
AN #40: Recursive technological improvement resulting in Comprehensive AI Services (January 7th, 2019)
AN #39: Using GANs for unrestricted adversarial examples (January 1st, 2019)
AN #38: In which I arrogantly highlight my own interview. Also how compute affects AI timelines (December 25th, 2018)
AN #37: How to address “human safety problems”, and how AI systems need to account for “silly rules” (December 17th, 2018)
AN #36: Developing a theory of values to solve extrapolation issues, and an approach to train AI systems to reason well (December 11th, 2018)
AN #35: The dangers and non-inevitability of goal-directed behavior, and corrigibility through iterated distillation and amplification (December 3rd, 2018)
AN #34: Recursive reward modeling for agent alignment, and evaluating actions instead of outcomes (November 26th, 2018)
AN #33: Learning from both demos and preferences, and building a well-motivated AI instead of an AI with the right utility function (November 19th, 2018)
AN #32: Educational resources for deep RL, and more posts on embedded agency and value learning (November 12th, 2018)
AN #31: Sequences on the new Alignment Forum, and exploration by prediction error for random features (November 5th, 2018)
AN #30: Decomposition as training signal with iterated amplification and relational inductive biases with graph networks (October 29th, 2018)
AN #29: Autonomous driving through model-based imitation learning and the feasibility of interpretability (October 22nd, 2018)
AN #28: Threat models in adversarial examples research (October 15th, 2018)
AN #27: Aiming to solve AI safety in the limit of scaling arbitrarily far with Paul Christiano (October 8th, 2018)
AN #26: Classifying AI safety problems, and regularizing policies with an ensemble of dynamics models (October 2nd, 2018)
AN #25: Impact as changes to attainable utility and rationalism reality (September 24th, 2018)
AN #24: Contest on adversarial examples, counterfactuals for supervised learning, beating all of Atari with a single policy, and even more ML summaries (September 17th, 2018)
AN #23: Dreaming up goals and worlds, and what we want from a definition of impact (September 10th, 2018)
AN #22: Research agenda for AI governance (September 3rd, 2018)
AN #21: What happens at AI Impacts, RL phrased as probabilistic inference, and autonomous AI in Google’s data centers (August 27th, 2018)
AN #20: Can curiosity by itself lead to good behavior? (August 20th, 2018)
AN #19: OpenAI Five vs. Team Human and provable guarantees about neural nets (August 13th, 2018)
AN #18 (August 6th, 2018)
AN #17 (July 30th, 2018)
AN #16 (July 23rd, 2018)
AN #15 (July 16th, 2018)
AN #14 (July 9th, 2018)
AN #13 (July 2nd, 2018)
AN #12 (June 25th, 2018)
AN #11 (June 18th, 2018)
AN #10 (June 11th, 2018)
AN #9 (June 4th, 2018)
AN #8 (May 28th, 2018)
AN #7 (May 21st, 2018)
AN #6 (May 14th, 2018)
AN #5 (May 7th, 2018)
AN #4 (April 30th, 2018)
AN #3 (April 23rd, 2018)
AN #2 (April 16th, 2018)
AN #1 (April 9th, 2018)

Alignment Newsletter in Chinese

共享此文章：