購買設(shè)計(jì)請充值后下載,,資源目錄下的文件所見即所得,都可以點(diǎn)開預(yù)覽,,資料完整,充值下載可得到資源目錄里的所有文件。。。【注】:dwg后綴為CAD圖紙,doc,docx為WORD文檔,原稿無水印,可編輯。。。具體請見文件預(yù)覽,有不明白之處,可咨詢QQ:12401814
遼寧科技大學(xué)本科生畢業(yè)設(shè)計(jì) 第 10 頁
Learning Control of Robot Manipulators
ROBERTO HOROWITZ
Department of Mechanical Engineering
University of California at Berkeley
Berkeley,CA 94720,U.S.A
Phone:(510)642-4675
e-mail:horowitz@canaima.berkeley.edu
Abstract
Learning control encompasses a class of control algorithms for programmable machines such as robots which attain, through an interactive process, the motor dexterity that enables the machine to execute complex tasks. In this paper we discuss the use of function identification and adaptive control algorithms in learning controllers for robot manipulators. In particular, we discuss the similarities and differences between betterment learning schemes, repetitive controllers and adaptive learning schemes based on integral transforms. The stability and convergence properties of adaptive learning algorithms based on integral transforms are highlighted and experimental results illustrating some of these properties are presented.
Key words: Learning control, adaptive control, repetitive control, robotics
Introduction
The emulation of human learning has long been among the most sought after and elusive goals in robotics and artificial intelligence. Many aspects of human learning are still not well understood. However, much progress has been achieved in robotics motion control toward emulating how humans develop the necessary motor skills to execute complex motions. In this paper we will refer to learning controllers as the class of control systems that generate a control action in an interactive manner using a function adaptation algorithm, in order to execute a prescribed action. In typical learning control applications the machine under control repeatedly attempts to execute a prescribed task while the adaptation algorithm successively improves the control system’s performance from one trial to the next by updating the control input based on the error signals from previous trials.
The term learning control in the robot motion control context was perhaps first used by Arimoto and his colleagues(c,f(Arimoto et al.,1984;Arimoto et al.,1988)).Arimoto defined learning control as the class of control algorithms that achieve asymptotic zero error tracking by an interactive betterment process ,which Arimoto called learning. In this process a single finite horizon tracking task is repeatedly performed by the robot, starting always from the same initial condition. The control action at each trial is equal to the control action of the previous trial plus terms proportional to the tracking error and its time derivative respectively.
Parallel to the development of the learning and betterment control schemes, a significant amount of research has been directed toward the application of repetitive control algorithms for robot trajectory tracking and other motion control problems (c.f.(Hara et al.,1988;Tomizuka et al.,1989;Tomizuka,1992)). The basic objective in repetitive control is to cancel an unknown periodic disturbance or to track an unknown periodic reference trajectory. In it’s simplest form, the periodic signal generator of many repetitive control algorithm closely resembles the betterment learning laws in (Arimoto et al., 1984; Arimoto et al.,1988).However, while the learning betterment controller acts during a finite time horizon, the repetitive controller acts continuously as regulator. Moreover, in the learning betterment approach, it is assumed that, at every learning trial, the robot starts executing the task from the same initial condition. This is not the case in the repetitive control approach.
My interest in learning and repetitive control arouse in 1987, as a consequence of studying the stability of a class of adaptive and repetitive controllers for robot manipulators with my former student and colleague Nader Sadegh. My colleague and friend Masayoshi Tomizuka had been working very actively in the area of repetitive control and he introduced me to this problem. At that time there was much activity in the robotics and control communities toward finding adaptive control algorithms for robot manipulators which were rigorously proven to be asymptotically stable. The problem had been recently solved using passivity by Slotine and Li (1986), Sadegh and Horowitz (1987) and Wen and Baynard (1988). In contrast, most of the stability results in learning and repetitive control of that period relied on several unrealistic assumptions: either the dynamics of the robot was assumed linear, or it was assumed that it could be at least partially linearized with feedback control. Moreover, it was assumed in most works that the actual response of the robot was periodic or repeatable, even during the learning transient, and that joint accelerations could be directly measured. Nader and I had recently overcome some of these problems in our adaptive control research, and concluded that learning controllers could be synthesized and analyzed, using a similar approach. For us the main appeal of learning and repetitive controllers lied in their simplicity. In most of the other approaches to robot trajectory control, including parametric adaptive control, it is necessary to compute the so called inverse dynamic equations of the robot. In many of these schemes these equations have to be computed in real time. In contrast, in the betterment learning and repetitive control schemes, the control action is generated by relatively simple functional adaptation algorithms. Moreover, since most robot applications in industry involve the repeated execution of the same task, the idea of implementing a control algorithm which would “l(fā)earn” through practice, without requiring any a-priori knowledge of the structure of the robot equations of motion, was also very appealing. Our approach to the synthesis of learning controllers relied on the following insights: i) In learning control, motor dexterity is not gained through the used of feedback. This was the approach used in most adaptive controllers at the time(c,f(Slotine and Li,1987;Sadegh and Horowitz,1987;Ortega and Spong,1989)).In these adaptive schemes both the nonlinear control law and the parameter adaptation algorithm regressor are functions of the actual robot joint coordinates and velocities. ii) In contrast, in learning control algorithms, motor dexterity is gained through the use of a feedforward control action which is stored in memory and is retrieved as the task is executed. The learning process involves the functional adaptation of the feedforward action. iii) Feedback plays a fundamental role in stabilizing the system and in guaranteeing that the map between the feedforward function error and the tracking errors is strictly passive. It therefore became apparent to us that, in order to use passivity based adaptive control results in the synthesis and analysis of learning and repetitive algorithms, it was necessary to formulate and to prove the stability of an adaptive control law which accomplishes the linearization of the robot dynamics by feedforward rather then feedback control. We presented these results in (Sadeh and Horowitz,1990) with the introduction of the so called Desired Compensation Adaptive Law (DCAL).In this adaptive scheme both the nonlinear control law and the parameter adaptation algorithm regressor are functions of the desired trajectories and velocities. Subsequently we were able to synthesize repetitive controllers for robot arms by replacing the adaptive law in the DCAL with a repetitive control law (Sadegh et al.,1990).
Unfortunately, as discussed in (Hara et al., 1988; Tomizuka et al.,1989; Sadegh et al.,1990), the asymptotic convergence of the basic repetitive control system can only be guaranteed under restrictive conditions in the plant dynamics or restrictions in the nature of the disturbance signals. These conditions are generally not satisfied in robot control applications. Most often ,modifications in the update schemes are introduced ,such as the so called Q filter modification (Hara et al.,1988;Tomizuka et al.,1989),that enhance the robustness of the repetitive controller, at the expense of limiting its tracking performance. Likewise, the convergence of betterment learning schemes is proven by appealing to strict assumptions regarding the initial condition of the robot at the beginning of each learning trial. Another shortcoming of the betterment learning and repetitive control schemes discussed so far, is that these algorithms were developed for the iterative learning of a single task. None of the research works in these areas provided a mechanism for extending the learning process so that a family of tasks can be simultaneously learned by the machine, or provide a systematic mechanism for using the dexterity gained in learning a particular task to subsequently perform a somewhat different task of a similar nature. After Nader left Berkeley to become a faculty member in the Georgia Institute of Technology, I begun working on these problems with Bill Messner.
Our research has revealed that the robustness limitation of the basic betterment and repetitive control laws and the inability of these algorithms to learn multiple tasks in part stem from the fact that all these schemes use point to point function adaptation algorithms. These algorithms only update the value of the control input at the current instant of time and do not provide a mechanism for updating the control input at neighboringpoints. However, in most applications the control function that must be identified is usually at least piecewise continuous. Thus, the value of the control at a given point will be almost the same as those of nearby points. Point to point function update laws do not take advantage of this situation. This issue has implications in more general learning problems and content addressable memories. Let us consider as an example the case of multi-task learning control algorithms for robot manipulators. In this application a function of several variables must be identified, namely the robot inverse dynamics. The trajectory used for training in betterment control cannot visit every point (or vector) in the domain of the fuction in a finite amount of time. Thus, the perfect identification of a control input function for one task using a point to point update law will not provide any information for generating the control input for other similar tasks unless the trajectories intersect, or some sort of interpolation is used. Similarly, in content addressable memories, it is desirable that the learning algorithm have an “ interpolating” property, so that input vectors which are similar to previously learned input vectors, but are novel to the system, output vectors that are similar to previously learned output vectors.
One solution to the interpolation problem in robot learning control was presented in(Miller,1987)with the use of the so called “cerebellar model arithmetic computer” (CMAC).In this algorithm an input vector is mapped to several locations in an intermediate memory, and the output vector is computed by summing over the values stored in all the locations to which the input vector was mapped. The mapping of input vectors has the property that inputs near to each other map to overlapping regions in intermediate memory. This causes interpolation to be performed automatically.
In (Messner et al.,1991) we introduced a class of function identification algorithms for learning control systems based on integral transforms, in order to address the robustness and interpolation problems of point to point repetitive and learning betterment controllers mentioned above. In these adaptive learning algorithms unknown functions are defined in terms of integral equations of the first kind which consist of known kernels and unknown influence functions. The learning process involves the indirect estimation of the unknown functions by estimating the influence functions. The entire influence function is modified in proportion to the value of the kernelat each point. Thus, the use of the kernel in both the update of the influence functions and in the generation of the function estimate provides these algorithms with desirable interpolation and smoothing properties and overcomes many of the limitations concerning the estimation of multivariable functions of prior point to point betterment and repetitive control schemes. Moreover, the use of integral transforms makes it is possible to demonstrate strong stability and convergence results for these learning algorithms.
The reminder of the paper we discuss the use of learning control in the robot tracking control context ,and stress the similarities and between betterment learning chemes, repetitive control schemes and learning schemes based on integral transforms. Conclusions and reflections on some of the outstanding problems in this area are included in the last section.
機(jī)器人模仿控制論
羅伯特霍洛維茨
機(jī)械工程學(xué)系
加州大學(xué)伯克利分校
伯克利分校,加州94720 ,美國
電話: ( 510 ) 642-4675
電子郵箱: horowitz@canaima.berkeley.edu
摘要
模仿控制涵蓋了一類可編程機(jī)器的控制算法,如機(jī)器人的動作就是通過一個互動的進(jìn)程,以及能讓機(jī)器來執(zhí)行復(fù)雜任務(wù)的機(jī)動馬達(dá)來實(shí)現(xiàn)的。在本文中,我們討論了機(jī)器人模仿控制器功能的識別和自適應(yīng)控制算法的使用。我們還特別討論了在積分變換基礎(chǔ)上改進(jìn)模仿方案,重復(fù)控制器和自適應(yīng)模仿方案的異同,突出了在積分變換基礎(chǔ)上自適應(yīng)模仿算法的穩(wěn)定性和收斂性,并給出了表明其中一些特性的實(shí)驗(yàn)結(jié)果。
關(guān)鍵詞:模仿控制,自適應(yīng)控制,重復(fù)控制,機(jī)器人
導(dǎo)言
機(jī)器人技術(shù)和模仿人類的人工智能一直是最難以實(shí)現(xiàn)的追求和目標(biāo)。雖然關(guān)于人類許多方面的模仿仍然沒有得到很好的實(shí)現(xiàn),但是在模仿人類如何獲得執(zhí)行復(fù)雜的動作所必需的運(yùn)動技能上,機(jī)器人運(yùn)動控制已經(jīng)取得了很大進(jìn)展。在本文中,我們將參照模仿控制器類的控制系統(tǒng),生成一個以迭代方式進(jìn)行的控制動作,并運(yùn)用功能適應(yīng)算法,以執(zhí)行規(guī)定的動作。在典型的模仿控制應(yīng)用軟件中,測試系統(tǒng)測試出錯誤信號后便更新控制輸入,而適應(yīng)算法也因此不斷地提高控制系統(tǒng)的性能,從而受控制的機(jī)器可以反復(fù)執(zhí)行規(guī)定的任務(wù)。
在研究機(jī)器人運(yùn)動控制的領(lǐng)域里,模仿控制這一術(shù)語也許是第一次為Arimoto和他的同事們所使用(c,f(Arimoto et al.,1984;Arimoto et al.,1988))。Arimoto把模仿控制定義為通過迭代方式改善進(jìn)程從而達(dá)到零誤差漸近跟蹤的一類控制算法,他也把它命名為模仿。在這個過程中,機(jī)器人總是從相同的初始條件開始,在一個單一的有限度的范圍內(nèi)進(jìn)行反復(fù)的任務(wù)追蹤??刂苿幼鞯拿看螠y試結(jié)果相當(dāng)于控制動作的前一次測試結(jié)果再加上加上條件比例跟蹤誤差及其時間導(dǎo)數(shù)。
與模仿和改善控制方案并行發(fā)展的是,大量的研究已經(jīng)直接針對機(jī)器人軌跡跟蹤重復(fù)控制算法的應(yīng)用和其他運(yùn)動控制問題(c.f.(Hara et al.,1988;Tomizuka et al.,1989;Tomizuka,1992))。重復(fù)控制的基本目標(biāo)是消除不明周期干擾或跟蹤未知定期參考軌跡。在它最簡單的形式中,許多重復(fù)控制算法的定期信號發(fā)生器與改善模仿規(guī)律很相似(Arimoto et al.,1984; Arimoto et al.,1988)。然而,在模仿過程中的行為改善控制器有時間界限,該行為不斷重復(fù)控制器上調(diào)節(jié)器的動作。此外,對于模仿改善的方法,假定機(jī)器人在每一次模仿試驗(yàn)中總是從相同的初始條件開始執(zhí)行任務(wù)的,但是這不是重復(fù)控制方法那種情況。
我在模仿和重復(fù)控制方面的興趣開始于1987年,是因?yàn)槟菚r我和我的校友及同事Nader Sadegh一起學(xué)習(xí)研究了一類有關(guān)機(jī)器人自適應(yīng)和重復(fù)控制器穩(wěn)定性的知識。我的同事和朋友Masayoshi Tomizuka在重復(fù)控制這塊領(lǐng)域里一直都非常積極地去研究,也是他把我引入了這個課題。當(dāng)時機(jī)器人技術(shù)和控制領(lǐng)域里有很多人都積極的為機(jī)器人尋找能漸近穩(wěn)定的自適應(yīng)控制算法,所以這些算法都必須通過嚴(yán)格的證明。最近,Slotine和Li(1986),Sadegh和Horowitz(1987以及 Wen and Baynard(1988)已經(jīng)通過運(yùn)用鈍性解決了這個問題。與此相反的是,在那一時期大部分模仿和重復(fù)控制的的穩(wěn)定性成果都是建立在幾個不現(xiàn)實(shí)的假設(shè)之上的,或者是動態(tài)的機(jī)器人線性假定,或者是被認(rèn)為可能是至少部分線性反饋控制。此外,還有人認(rèn)為在大多數(shù)工程中,即使是短暫的模仿,機(jī)器人的實(shí)際反應(yīng)也是定期或重復(fù)的,并且還可直接測量出聯(lián)合加速度。最近我和Nader在我們的自適應(yīng)控制研究中已證明了這種說法,并得出結(jié)論,我們認(rèn)為模仿控制器可使用類似的做法進(jìn)行合成和分析。我們覺得,模仿和控制器的主要優(yōu)點(diǎn)在于它的簡單和直接。在機(jī)器人軌跡控制的其他方法中,還包括參數(shù)自適應(yīng)控制,但是有必要計(jì)算一下所謂的機(jī)器人逆動力學(xué)方程。在許多類似的方法中,這些方程必須以實(shí)際時間計(jì)算。相反,在改善模仿和重復(fù)控制的方法中,動作的控制是由簡單功能的自適應(yīng)算法相關(guān)產(chǎn)生的。此外,由于在工業(yè)應(yīng)用的大多數(shù)機(jī)器人涉及重復(fù)執(zhí)行同樣的任務(wù),那么實(shí)現(xiàn)一個不需要任何一個有經(jīng)驗(yàn)或者具備一定知識結(jié)構(gòu)的機(jī)器人運(yùn)動方程而僅通過實(shí)踐“模仿”的控制算法的想法,也非常具有吸引力.我們合成模仿控制器的方法根據(jù)如下見解:一)在模仿控制方面,電機(jī)靈巧度不是通過使用獲得的反饋來實(shí)現(xiàn)的。這是當(dāng)時大部分自適應(yīng)控制器采用的方法(c,f(Slotine Li,1987;Sadegh和Horowitz,1987;Ortega和Spong,1989))。在這些自適應(yīng)的方法中,非線性控制規(guī)律和參數(shù)調(diào)整算法的遞減都是實(shí)際機(jī)器人關(guān)節(jié)坐標(biāo)和速度的性能。二)與此相反,在模仿控制算法中,機(jī)動靈活的獲得是通過使用一種前饋控制的動作,這是存儲在內(nèi)存中的,當(dāng)執(zhí)行任務(wù)時便可以檢索出來。但是模仿的過程中要涉及到調(diào)整前饋動作后功能的適應(yīng)性。三)反饋信息在使系統(tǒng)穩(wěn)定和保證前饋功能誤差和跟蹤誤差之間映射的嚴(yán)格被動中發(fā)揮著基礎(chǔ)性的作用。因此,很明顯,我們認(rèn)為為了在模仿和反復(fù)算法綜合和分析中的自適應(yīng)控制結(jié)果的基礎(chǔ)上使用被動,有必要計(jì)算和證明自適應(yīng)控制法的穩(wěn)定,即能實(shí)現(xiàn)機(jī)器人動態(tài)線性的前饋控制,而不是反饋控制。我們把這些結(jié)果寫在(Sadeh和Horowitz,1990)并有所謂的期望補(bǔ)償自適應(yīng)律( DCAL )的介紹。在這個自適應(yīng)方案中,非線性控制律和參數(shù)自適應(yīng)算法遞減都是改變后運(yùn)動軌跡和速度的性能。后來我們就可以通過運(yùn)用重復(fù)控制法(Sadegh et al.,1990)代替自適應(yīng)法的DCAL去合成重復(fù)控制器的機(jī)器人手臂了。
不幸的是,正如在(Hara et al.,1988; Tomizuka et al.,1989; Sadegh et al.,1990)討論的一樣,漸近收斂的基礎(chǔ)的重復(fù)控制系統(tǒng)只有在機(jī)械動態(tài)或局限性干擾信號的嚴(yán)格限制性條件下才能保證。這些條件一般不適合應(yīng)用在機(jī)器人的控制中。大多數(shù)情況下,更新方案后有相應(yīng)的修改,如所謂的Q濾波器修改(Hara et al.,1988;Tomizuka et al.,1989),可以增強(qiáng)重復(fù)控制器強(qiáng)度,但是要以犧牲限制其跟蹤性能為代價。同樣,就機(jī)器人開始時的每一個模仿實(shí)驗(yàn)的初始條件而言,在合理的假設(shè)下,融合改善模仿方案已經(jīng)得到證明。到目前為止,我們討論的改善模仿和重復(fù)控制方案的另一個缺點(diǎn),就是這些算法是為單一任務(wù)的迭代模仿而提出的。在這些領(lǐng)域里沒有任何研究工作可提供拓展模仿工序的一個這樣的機(jī)制,它能使機(jī)器同時模仿大量家務(wù)工作,或提是供一個系統(tǒng)的機(jī)制,即運(yùn)用通過模仿特殊任務(wù)獲得的靈活性去執(zhí)行稍微有點(diǎn)不同但具有類似性質(zhì)的工作。在Nader離開伯克利分校成為一名佐治亞理工學(xué)院的教員后,我和Bill Messner開始對這些問題進(jìn)行了研究。
我們的研究表明,本質(zhì)改變和重復(fù)控制規(guī)律的巨大局限性和這些用來模仿多樣任務(wù)的算法的失效在某種程度上源于一個事實(shí),即所有這些方案都是使用點(diǎn)對點(diǎn)功能適應(yīng)算法。這些算法僅僅只更新了控制輸入在當(dāng)前即時的時間內(nèi)的實(shí)用性,但并沒有提供一種可以在相鄰時間內(nèi)更新控制輸入的機(jī)制。然而,大多數(shù)應(yīng)用中的必須查明的控制功能,通常至少是分段連續(xù)的。因此,在某一特定點(diǎn)上控制的實(shí)用性將和附近的點(diǎn)的實(shí)用性幾乎一樣,而點(diǎn)對點(diǎn)的功能更新規(guī)律不能充分利用這種情況。這個焦點(diǎn)已經(jīng)更廣泛地影響模仿問題和可尋址內(nèi)容存儲器的問題。讓我們考慮把機(jī)器人的多任務(wù)模仿控制算法的情況作為一個例子。在此應(yīng)用程序中,幾個功能變量必須確定,即機(jī)器人的逆動力學(xué)參數(shù)。在有限的時間內(nèi)用于訓(xùn)練改善控制的軌道不能訪問到主函數(shù)每一個點(diǎn)(或載體)。因此,在用點(diǎn)至點(diǎn)更新的規(guī)律時,執(zhí)行一次任務(wù)中控制輸入功能的精確識別將不會提供生成控制輸入任務(wù)的任何信息,除非其他類似的軌道相交,或者使用了某種插值。類似地,在可尋址內(nèi)容存儲器中,可取的做法是模仿算法應(yīng)有一個“插值“的性能,因此,輸入向量是類似以前的經(jīng)驗(yàn)輸入向量,但對系統(tǒng)來說還是新的向量,輸出向量還是類似以前的經(jīng)驗(yàn)輸出載體。
在機(jī)器人模仿控制的插值問題上,(Miller,1987)一書提出了利用所謂的“小腦模型算術(shù)計(jì)算機(jī)”(CMAC)的解決辦法。在該算法中,一個輸入向量被映射到在中間記憶的幾個點(diǎn)上,而輸出向量是由總結(jié)存儲在輸入向量被映射到的所有點(diǎn)的值算出的。輸入向量的映射有一個特性就是在中間記憶中彼此靠近的輸入向量將映射在重疊的區(qū)域,這能導(dǎo)致插值的自動進(jìn)行。
在(Messner et al.,1991)中,我們介紹了基于積分變換的模仿控制系統(tǒng)的一類功能識別算法,以便處理在上文提到的點(diǎn)對點(diǎn)的重復(fù)性和模仿改善控制器的強(qiáng)度和插值問題。在這種自適應(yīng)模仿算法中,未知函數(shù)是以第一類積分方程的形式定義的,這種積分方程由已知的和未知的核心功能影響函數(shù)組成。模仿過程中涉及到通過估計(jì)影響函數(shù)對未知函數(shù)的間接估計(jì)。整個影響函數(shù)被調(diào)整成與每一個點(diǎn)的核心值成比例。因此,在更新影響函數(shù)和發(fā)生函數(shù)的估計(jì)中核心值的使用為這些算法提供了理想的插值和平滑的性能,并且克服了前面點(diǎn)對點(diǎn)改善和重復(fù)控制方案的有關(guān)多變函數(shù)估計(jì)中的限制問題。此外,積分變換的運(yùn)用使得這些模仿算法表現(xiàn)出強(qiáng)大的穩(wěn)定性和收斂性成為可能。
需要提醒的是,在我們討論的研究報(bào)告中使用的機(jī)器人環(huán)境跟蹤模仿控制方面,以及強(qiáng)調(diào)的改善模仿方案和重復(fù)控制方案及模仿方案異同的基礎(chǔ)都是積分變換。在這一領(lǐng)域一些未得到解決的問題的結(jié)論和思考將在后續(xù)介紹。