public inbox for pve-devel@lists.proxmox.com
 help / color / mirror / Atom feed
* [pve-devel] [RFC qemu-server++ 0/22] remote migration
@ 2021-04-13 12:16 Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH proxmox 1/2] websocket: make field public Fabian Grünbichler
                   ` (22 more replies)
  0 siblings, 23 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

this series adds remote migration for VMs. there's still plenty of
TODOs/FIXMEs/stuff that requires discussion, hence the RFC. live
migration with NBD and storage-migrated disks should work already.

the performance bottle neck (~190MB/s on loopback) for the websocket
connection seems to be in pveproxy at the moment - the rust code should
manage about 700MB/s.

overview over affected repos and changes, see individual patches for
more details.

proxmox:

some compatible changes to make websocket code usable for client-side
connections, required by proxmox-websocket-tunnel

proxmox-websocket-tunnel:

new tunnel helper tool for forwarding commands and data over websocket
connections, required by qemu-server on source side
TODO: better error handling
TODO: fingerprint checking/valid certs/..
TODO: WS key generation
TODO: decide on mask?
TODO: investigate performance bottlenecks once PVE api server gets
faster

pve-access-control:

new ticket type, required by qemu-server on target side

pve-cluster:

new remote.cfg and related helpers, required by qemu-server on source
side
TODO: ACLs, CLI, API for managing config
TODO: handling of discovered nodes with valid certificates
TODO: add additional information like default bwlimits, storage/bridge
mappings

pve-common:

bridgepair format akin to storage pair, pve-bridge-id option, required
by qemu-server
TODO: adapt pve-container

pve-guest-common:

handle remote migration (no SSH) in AbstractMigrate,
required by qemu-server

pve-manager:

new 'addr' endpoint for retrieving remote node IPs, required on target
node

pve-storage:

extend 'pvesm import' to allow import from UNIX socket, required on
target node by qemu-server

qemu-server:

some refactoring, new mtunnel endpoints, new remote_migration endpoints
TODO: check remote ACLs
TODO: handle pending changes and snapshots
TODO: CLI for remote migration
potential TODO: expose remote info via additional endpoints (resources? vmids?
permissions? ...)

as usual, some of the patches are best viewed with '-w', especially in
qemu-server..




^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH proxmox 1/2] websocket: make field public
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH proxmox 2/2] websocket: adapt for client connection Fabian Grünbichler
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 proxmox/src/tools/websocket.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/proxmox/src/tools/websocket.rs b/proxmox/src/tools/websocket.rs
index 57e2591..8685aab 100644
--- a/proxmox/src/tools/websocket.rs
+++ b/proxmox/src/tools/websocket.rs
@@ -658,7 +658,7 @@ pub const MAGIC_WEBSOCKET_GUID: &str = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11";
 
 /// Provides methods for connecting a WebSocket endpoint with another
 pub struct WebSocket {
-    text: bool,
+    pub text: bool,
 }
 
 impl WebSocket {
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH proxmox 2/2] websocket: adapt for client connection
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH proxmox 1/2] websocket: make field public Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH proxmox-websocket-tunnel 1/2] initial commit Fabian Grünbichler
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

previously, this was only used for the server side handling of web
sockets. by making the mask part of the WebSocket struct and making some
of the fns associated, we can re-use this for client-side connections
such as in proxmox-websocket-tunnel.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 proxmox/src/tools/websocket.rs | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/proxmox/src/tools/websocket.rs b/proxmox/src/tools/websocket.rs
index 8685aab..e3a559b 100644
--- a/proxmox/src/tools/websocket.rs
+++ b/proxmox/src/tools/websocket.rs
@@ -659,6 +659,7 @@ pub const MAGIC_WEBSOCKET_GUID: &str = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11";
 /// Provides methods for connecting a WebSocket endpoint with another
 pub struct WebSocket {
     pub text: bool,
+    pub mask: Option<[u8; 4]>,
 }
 
 impl WebSocket {
@@ -710,10 +711,13 @@ impl WebSocket {
             .header(SEC_WEBSOCKET_PROTOCOL, ws_proto)
             .body(Body::empty())?;
 
-        Ok((Self { text }, response))
+        let mask = None;
+
+        Ok((Self { text, mask }, response))
     }
 
-    async fn handle_channel_message<W>(
+    pub async fn handle_channel_message<W>(
+        &self,
         result: WebSocketReadResult,
         writer: &mut WebSocketWriter<W>,
     ) -> Result<OpCode, Error>
@@ -722,11 +726,11 @@ impl WebSocket {
     {
         match result {
             Ok((OpCode::Ping, msg)) => {
-                writer.send_control_frame(None, OpCode::Pong, &msg).await?;
+                writer.send_control_frame(self.mask, OpCode::Pong, &msg).await?;
                 Ok(OpCode::Pong)
             }
             Ok((OpCode::Close, msg)) => {
-                writer.send_control_frame(None, OpCode::Close, &msg).await?;
+                writer.send_control_frame(self.mask, OpCode::Close, &msg).await?;
                 Ok(OpCode::Close)
             }
             Ok((opcode, _)) => {
@@ -735,7 +739,7 @@ impl WebSocket {
             }
             Err(err) => {
                 writer
-                    .send_control_frame(None, OpCode::Close, &err.generate_frame_payload())
+                    .send_control_frame(self.mask, OpCode::Close, &err.generate_frame_payload())
                     .await?;
                 Err(Error::from(err))
             }
@@ -743,6 +747,7 @@ impl WebSocket {
     }
 
     async fn copy_to_websocket<R, W>(
+        &self,
         mut reader: &mut R,
         mut writer: &mut WebSocketWriter<W>,
         receiver: &mut mpsc::UnboundedReceiver<WebSocketReadResult>,
@@ -759,7 +764,7 @@ impl WebSocket {
                     res = buf.read_from_async(&mut reader).fuse() => res?,
                     res = receiver.recv().fuse() => {
                         let res = res.ok_or_else(|| format_err!("control channel closed"))?;
-                        match Self::handle_channel_message(res, &mut writer).await? {
+                        match self.handle_channel_message(res, &mut writer).await? {
                             OpCode::Close => return Ok(true),
                             _ => { continue; },
                         }
@@ -799,10 +804,10 @@ impl WebSocket {
 
         let (tx, mut rx) = mpsc::unbounded_channel();
         let mut wsreader = WebSocketReader::new(usreader, tx);
-        let mut wswriter = WebSocketWriter::new(None, self.text, uswriter);
+        let mut wswriter = WebSocketWriter::new(self.mask, self.text, uswriter);
 
         let ws_future = tokio::io::copy(&mut wsreader, &mut dswriter);
-        let term_future = Self::copy_to_websocket(&mut dsreader, &mut wswriter, &mut rx);
+        let term_future = self.copy_to_websocket(&mut dsreader, &mut wswriter, &mut rx);
 
         let res = select! {
             res = ws_future.fuse() => match res {
@@ -812,7 +817,7 @@ impl WebSocket {
             res = term_future.fuse() => match res {
                 Ok(sent_close) if !sent_close => {
                     // status code 1000 => 0x03E8
-                    wswriter.send_control_frame(None, OpCode::Close, &WebSocketErrorKind::Normal.to_be_bytes()).await?;
+                    wswriter.send_control_frame(self.mask, OpCode::Close, &WebSocketErrorKind::Normal.to_be_bytes()).await?;
                     Ok(())
                 }
                 Ok(_) => Ok(()),
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH proxmox-websocket-tunnel 1/2] initial commit
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH proxmox 1/2] websocket: make field public Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH proxmox 2/2] websocket: adapt for client connection Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH proxmox-websocket-tunnel 2/2] add tunnel implementation Fabian Grünbichler
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 .gitignore    |  1 +
 .cargo/config |  5 +++++
 Cargo.toml    | 11 +++++++++++
 3 files changed, 17 insertions(+)
 create mode 100644 .gitignore
 create mode 100644 .cargo/config
 create mode 100644 Cargo.toml

diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..ea8c4bf
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1 @@
+/target
diff --git a/.cargo/config b/.cargo/config
new file mode 100644
index 0000000..3b5b6e4
--- /dev/null
+++ b/.cargo/config
@@ -0,0 +1,5 @@
+[source]
+[source.debian-packages]
+directory = "/usr/share/cargo/registry"
+[source.crates-io]
+replace-with = "debian-packages"
diff --git a/Cargo.toml b/Cargo.toml
new file mode 100644
index 0000000..939184c
--- /dev/null
+++ b/Cargo.toml
@@ -0,0 +1,11 @@
+[package]
+name = "proxmox-websocket-tunnel"
+version = "0.1.0"
+authors = ["Fabian Grünbichler <f.gruenbichler@proxmox.com>"]
+edition = "2018"
+license = "AGPL-3"
+description = "Proxmox websocket tunneling helper"
+
+exclude = ["debian"]
+
+[dependencies]
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH proxmox-websocket-tunnel 2/2] add tunnel implementation
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (2 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH proxmox-websocket-tunnel 1/2] initial commit Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH access-control 1/2] tickets: add tunnel ticket Fabian Grünbichler
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

the websocket tunnel helper accepts control commands (encoded as
single-line JSON) on stdin, and prints responses on stdout.

the following commands are available:
- "connect" a 'control' tunnel via a websocket
- "forward" a local unix socket to a remote socket via a websocket
-- if requested, this will ask for a ticket via the control tunnel after
accepting a new connection on the unix socket
- "close" the control tunnel and any forwarded socket

any other json input (without the 'control' flag set) is forwarded as-is
to the remote end of the control tunnel.

internally, the tunnel helper will spawn tokio tasks for
- handling the control tunnel connection (new commands are passed in via
an mpsc channel together with a oneshot channel for the response)
- handling accepting new connections on each forwarded unix socket
- handling forwarding data over accepted forwarded connections

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    proxmox_backup dependency is just there for HTTP client code that should be extracted somewhere else
    proxmox dependency requires websocket patches exposing a bit more stuff for client usage

    full repo available on my staff git..

 Cargo.toml  |  14 ++
 src/main.rs | 396 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 410 insertions(+)
 create mode 100644 src/main.rs

diff --git a/Cargo.toml b/Cargo.toml
index 939184c..18ba297 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -9,3 +9,17 @@ description = "Proxmox websocket tunneling helper"
 exclude = ["debian"]
 
 [dependencies]
+anyhow = "1.0"
+futures = "0.3"
+futures-util = "0.3"
+hyper = { version = "0.14" }
+openssl = "0.10"
+percent-encoding = "2"
+proxmox = { version = "0.11", path = "../proxmox/proxmox", features = ["websocket"] }
+# just for tools::http and tools::async_io::EitherStream, need to move them somewhere else
+proxmox-backup = { version = "1.0.8", path = "../proxmox-backup" }
+serde = { version = "1.0", features = ["derive"] }
+serde_json = "1.0"
+tokio = { version = "1", features = ["io-std", "io-util", "macros", "rt-multi-thread", "sync"] }
+tokio-stream = { version = "0.1", features = ["io-util"] }
+tokio-util = "0.6"
diff --git a/src/main.rs b/src/main.rs
new file mode 100644
index 0000000..6d21352
--- /dev/null
+++ b/src/main.rs
@@ -0,0 +1,396 @@
+use anyhow::{bail, format_err, Error};
+
+use std::collections::VecDeque;
+use std::sync::{Arc, Mutex};
+
+use futures::future::FutureExt;
+use futures::select;
+
+use hyper::client::{Client, HttpConnector};
+use hyper::header::{SEC_WEBSOCKET_KEY, SEC_WEBSOCKET_PROTOCOL, SEC_WEBSOCKET_VERSION, UPGRADE};
+use hyper::upgrade::Upgraded;
+use hyper::{Body, Request, StatusCode};
+
+use openssl::ssl::{SslConnector, SslMethod};
+use percent_encoding::{utf8_percent_encode, NON_ALPHANUMERIC};
+
+use serde::{Deserialize, Serialize};
+use serde_json::{Map, Value};
+use tokio::io::{AsyncBufReadExt, AsyncWriteExt, BufReader};
+use tokio::net::{UnixListener, UnixStream};
+use tokio::sync::{mpsc, oneshot};
+use tokio_stream::wrappers::LinesStream;
+use tokio_stream::StreamExt;
+
+use proxmox::tools::websocket::{OpCode, WebSocket, WebSocketReader, WebSocketWriter};
+use proxmox_backup::tools::http::HttpsConnector;
+
+#[derive(Serialize, Deserialize, Debug)]
+#[serde(rename_all = "kebab-case")]
+enum CmdType {
+    Connect,
+    Forward,
+    CloseCmd,
+    NonControl,
+}
+
+type CmdData = Map<String, Value>;
+
+#[derive(Serialize, Deserialize, Debug)]
+#[serde(rename_all = "kebab-case")]
+struct ConnectCmdData {
+    // target URL for WS connection
+    url: String,
+    // fingerprint of TLS certificate
+    fingerprint: Option<String>,
+    // addition headers such as authorization
+    headers: Option<Vec<(String, String)>>,
+}
+
+#[derive(Serialize, Deserialize, Debug, Clone)]
+#[serde(rename_all = "kebab-case")]
+struct ForwardCmdData {
+    // target URL for WS connection
+    url: String,
+    // addition headers such as authorization
+    headers: Option<Vec<(String, String)>>,
+    // fingerprint of TLS certificate
+    fingerprint: Option<String>,
+    // local UNIX socket path for forwarding
+    unix: String,
+    // request ticket using these parameters
+    ticket: Option<Map<String, Value>>,
+}
+
+struct CtrlTunnel {
+    sender: Option<mpsc::UnboundedSender<(Value, oneshot::Sender<String>)>>,
+    forwarded: Arc<Mutex<Vec<oneshot::Sender<()>>>>,
+}
+
+impl CtrlTunnel {
+    async fn read_cmd_loop(mut self) -> Result<(), Error> {
+        let mut stdin_stream = LinesStream::new(BufReader::new(tokio::io::stdin()).lines());
+        while let Some(res) = stdin_stream.next().await {
+            match res {
+                Ok(line) => {
+                    let (cmd_type, data) = Self::parse_cmd(&line)?;
+                    match cmd_type {
+                        CmdType::Connect => self.handle_connect_cmd(data).await,
+                        CmdType::Forward => {
+                            let res = self.handle_forward_cmd(data).await;
+                            match &res {
+                                Ok(()) => println!("{}", serde_json::json!({"success": true})),
+                                Err(msg) => println!(
+                                    "{}",
+                                    serde_json::json!({"success": false, "msg": msg.to_string()})
+                                ),
+                            };
+                            res
+                        }
+                        CmdType::NonControl => self
+                            .handle_tunnel_cmd(data)
+                            .await
+                            .map(|res| println!("{}", res)),
+                        _ => unimplemented!(),
+                    }
+                }
+                Err(err) => bail!("Failed to read from STDIN - {}", err),
+            }?;
+        }
+
+        Ok(())
+    }
+
+    fn parse_cmd(line: &str) -> Result<(CmdType, CmdData), Error> {
+        let mut json: Map<String, Value> = serde_json::from_str(line)?;
+        match json.remove("control") {
+            Some(Value::Bool(true)) => {
+                match json.remove("cmd").map(serde_json::from_value::<CmdType>) {
+                    None => bail!("input has 'control' flag, but no control 'cmd' set.."),
+                    Some(Err(e)) => bail!("failed to parse control cmd - {}", e),
+                    Some(Ok(cmd_type)) => Ok((cmd_type, json)),
+                }
+            }
+            _ => Ok((CmdType::NonControl, json)),
+        }
+    }
+
+    async fn websocket_connect(
+        url: String,
+        text: bool,
+        extra_headers: Vec<(String, String)>,
+        fingerprint: Option<String>,
+    ) -> Result<Upgraded, Error> {
+        let mut req = Request::builder()
+            .uri(url)
+            .header(UPGRADE, "websocket")
+            .header(SEC_WEBSOCKET_VERSION, "13")
+            .header(SEC_WEBSOCKET_KEY, "foobar") // TODO
+            .header(SEC_WEBSOCKET_PROTOCOL, if text { "text" } else { "binary" })
+            .body(Body::empty())
+            .unwrap();
+
+        let headers = req.headers_mut();
+        for (name, value) in extra_headers {
+            let name = hyper::header::HeaderName::from_bytes(name.as_bytes())?;
+            let value = hyper::header::HeaderValue::from_str(&value)?;
+            headers.insert(name, value);
+        }
+
+        let mut ssl_connector_builder = SslConnector::builder(SslMethod::tls()).unwrap();
+        if fingerprint.is_some() {
+            // FIXME actually verify fingerprint via callback!
+            ssl_connector_builder.set_verify(openssl::ssl::SslVerifyMode::NONE);
+        } else {
+            ssl_connector_builder.set_verify(openssl::ssl::SslVerifyMode::PEER);
+        }
+
+        let mut httpc = HttpConnector::new();
+        httpc.enforce_http(false); // we want https...
+        httpc.set_connect_timeout(Some(std::time::Duration::new(10, 0)));
+        let https = HttpsConnector::with_connector(httpc, ssl_connector_builder.build());
+
+        let client = Client::builder().build::<_, Body>(https);
+        let res = client.request(req).await?;
+        if res.status() != StatusCode::SWITCHING_PROTOCOLS {
+            bail!("server didn't upgrade: {}", res.status());
+        }
+
+        hyper::upgrade::on(res)
+            .await
+            .map_err(|err| format_err!("failed to upgrade - {}", err))
+    }
+
+    async fn handle_connect_cmd(&mut self, mut data: CmdData) -> Result<(), Error> {
+        let mut data: ConnectCmdData = data
+            .remove("data")
+            .ok_or_else(|| format_err!("'connect' command missing 'data'"))
+            .map(serde_json::from_value)??;
+
+        if self.sender.is_some() {
+            bail!("already connected!");
+        }
+
+        let upgraded = Self::websocket_connect(
+            data.url.clone(),
+            false,
+            data.headers.take().unwrap_or_else(Vec::new),
+            data.fingerprint.take(),
+        )
+        .await?;
+
+        let (tx, rx) = mpsc::unbounded_channel();
+        self.sender = Some(tx);
+        tokio::spawn(async move {
+            if let Err(err) = Self::handle_ctrl_tunnel(upgraded, rx).await {
+                eprintln!("Tunnel to {} failed - {}", data.url, err);
+            }
+        });
+
+        Ok(())
+    }
+
+    async fn handle_forward_cmd(&mut self, mut data: CmdData) -> Result<(), Error> {
+        let data: ForwardCmdData = data
+            .remove("data")
+            .ok_or_else(|| format_err!("'forward' command missing 'data'"))
+            .map(serde_json::from_value)??;
+
+        if self.sender.is_none() && data.ticket.is_some() {
+            bail!("dynamically requesting ticket requires cmd tunnel connection!");
+        }
+
+        let unix_listener = UnixListener::bind(data.unix.clone()).unwrap();
+        let (tx, rx) = oneshot::channel();
+        let data = Arc::new(data);
+
+        self.forwarded.lock().unwrap().push(tx);
+        let cmd_sender = self.sender.clone();
+
+        tokio::spawn(async move {
+            let mut rx = rx.fuse();
+            loop {
+                let accept = unix_listener.accept().fuse();
+                tokio::pin!(accept);
+                let data2 = data.clone();
+                select! {
+                    _ = rx => {
+                        eprintln!("received shutdown signal, closing unix listener stream");
+                    },
+                    res = accept => match res {
+                        Ok((unix_stream, _)) => {
+                            eprintln!("accepted new connection on '{}'", data2.unix);
+                            let data3: Result<Arc<ForwardCmdData>, Error> = match (&cmd_sender, &data2.ticket) {
+                                (Some(cmd_sender), Some(_)) => Self::get_ticket(cmd_sender, data2.clone()).await,
+                                _ => Ok(data2.clone()),
+                            };
+
+                            match data3 {
+                                Ok(data3) => {
+                                    tokio::spawn(async move {
+                                        if let Err(err) = Self::handle_forward_tunnel(data3.clone(), unix_stream).await {
+                                            eprintln!("Tunnel for {} failed - {}", data3.unix, err);
+                                        }
+                                    });
+                                },
+                                Err(err) => {
+                                    eprintln!("Failed to accept unix connection - {}", err);
+                                },
+                            };
+                        },
+                        Err(err) => eprintln!("Failed to accept unix connection on {} - {}", data2.unix, err),
+                    },
+                };
+            }
+        });
+
+        Ok(())
+    }
+
+    async fn handle_tunnel_cmd(&mut self, data: CmdData) -> Result<String, Error> {
+        match &mut self.sender {
+            None => bail!("not connected!"),
+            Some(sender) => {
+                let data: Value = data.into();
+                let (tx, rx) = oneshot::channel::<String>();
+                eprintln!("tunnel request: '{}'", data.to_string());
+                sender.send((data, tx))?;
+                let res = rx.await?;
+                eprintln!("tunnel response: '{}'", res);
+                Ok(res)
+            }
+        }
+    }
+
+    async fn handle_ctrl_tunnel(
+        websocket: Upgraded,
+        mut cmd_receiver: mpsc::UnboundedReceiver<(Value, oneshot::Sender<String>)>,
+    ) -> Result<(), Error> {
+        let (tunnel_reader, tunnel_writer) = tokio::io::split(websocket);
+        let (ws_close_tx, mut ws_close_rx) = mpsc::unbounded_channel();
+        let ws_reader = WebSocketReader::new(tunnel_reader, ws_close_tx);
+        let mut ws_writer = WebSocketWriter::new(Some([0, 0, 0, 0]), false, tunnel_writer);
+
+        let mut framed_reader =
+            tokio_util::codec::FramedRead::new(ws_reader, tokio_util::codec::LinesCodec::new());
+
+        let mut resp_tx_queue: VecDeque<oneshot::Sender<String>> = VecDeque::new();
+        let mut shutting_down = false;
+
+        loop {
+            select! {
+                res = ws_close_rx.recv().fuse() => {
+                    let res = res.ok_or_else(|| format_err!("WS control channel closed"))?;
+                    eprintln!("WS: received control message: '{:?}'", res);
+                },
+                res = framed_reader.next().fuse() => {
+                    match res {
+                        None if shutting_down => {
+                            eprintln!("WS closed");
+                            break;
+                        },
+                        None => bail!("WS closed unexpectedly"),
+                        Some(Ok(res)) => {
+                            resp_tx_queue
+                                .pop_front()
+                                .ok_or_else(|| format_err!("no response handler"))?
+                                .send(res)
+                                .map_err(|msg| format_err!("failed to send tunnel response '{}' back to requester - receiver already closed?", msg))?;
+                        },
+                        Some(Err(err)) => {
+                            eprintln!("WS: received failed - {}", err);
+                            // TODO handle?
+                        },
+                    }
+                },
+                res = cmd_receiver.recv().fuse() => {
+                    if shutting_down { continue };
+                    match res {
+                        None => {
+                            eprintln!("CMD channel closed, shutting down");
+                            ws_writer.send_control_frame(Some([1,2,3,4]), OpCode::Close, &[]).await?;
+                            shutting_down = true;
+                        },
+                        Some((msg, resp_tx)) => {
+                            resp_tx_queue.push_back(resp_tx);
+
+                            let line = format!("{}\n", msg);
+                            ws_writer.write_all(line.as_bytes()).await?;
+                            ws_writer.flush().await?;
+                        },
+                    }
+                },
+            };
+        }
+
+        Ok(())
+    }
+
+    async fn handle_forward_tunnel(
+        data: Arc<ForwardCmdData>,
+        unix: UnixStream,
+    ) -> Result<(), Error> {
+        let upgraded = Self::websocket_connect(
+            data.url.clone(),
+            false,
+            data.headers.clone().unwrap_or_else(Vec::new),
+            data.fingerprint.clone(),
+        )
+        .await?;
+
+        let ws = WebSocket { text: false, mask: Some([0, 0, 0, 0]) };
+        eprintln!("established new WS for forwarding '{}'", data.unix);
+        ws.serve_connection(upgraded, unix).await?;
+
+        eprintln!(
+            "done handling forwarded connection from '{}'",
+            data.unix
+        );
+
+        Ok(())
+    }
+
+    async fn get_ticket(
+        cmd_sender: &mpsc::UnboundedSender<(Value, oneshot::Sender<String>)>,
+        cmd_data: Arc<ForwardCmdData>,
+    ) -> Result<Arc<ForwardCmdData>, Error> {
+        eprintln!("requesting WS ticket via tunnel");
+        let ticket_cmd = match cmd_data.ticket.clone() {
+            Some(mut ticket_cmd) => {
+                ticket_cmd.insert("cmd".to_string(), serde_json::json!("ticket"));
+                ticket_cmd
+            },
+            None => bail!("can't get ticket without ticket parameters"),
+        };
+        let (tx, rx) = oneshot::channel::<String>();
+        cmd_sender.send((serde_json::json!(ticket_cmd), tx))?;
+        let ticket = rx.await?;
+        let mut ticket: Map<String, Value> = serde_json::from_str(&ticket)?;
+        let ticket = ticket.remove("ticket")
+            .ok_or_else(|| format_err!("failed to retrieve ticket via tunnel"))?;
+
+        let ticket = ticket.as_str().ok_or_else(|| format_err!("failed to format received ticket"))?;
+        let ticket = utf8_percent_encode(&ticket, NON_ALPHANUMERIC).to_string();
+
+        let mut data = cmd_data.clone();
+        let mut url = data.url.clone();
+        url.push_str("ticket=");
+        url.push_str(&ticket);
+        let mut d = Arc::make_mut(&mut data);
+        d.url = url;
+        Ok(data)
+    }
+}
+
+#[tokio::main]
+async fn main() -> Result<(), Error> {
+    do_main().await
+}
+
+async fn do_main() -> Result<(), Error> {
+    let tunnel = CtrlTunnel {
+        sender: None,
+        forwarded: Arc::new(Mutex::new(Vec::new())),
+    };
+    tunnel.read_cmd_loop().await
+}
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH access-control 1/2] tickets: add tunnel ticket
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (3 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH proxmox-websocket-tunnel 2/2] add tunnel implementation Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH access-control 2/2] ticket: normalize path for verification Fabian Grünbichler
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

just like VNC ticket, but different prefix to prevent confusion.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 PVE/AccessControl.pm | 50 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 38 insertions(+), 12 deletions(-)

diff --git a/PVE/AccessControl.pm b/PVE/AccessControl.pm
index 8b5be1e..9d9a4bd 100644
--- a/PVE/AccessControl.pm
+++ b/PVE/AccessControl.pm
@@ -445,12 +445,8 @@ sub verify_token {
     return wantarray ? ($tokenid) : $tokenid;
 }
 
-
-# VNC tickets
-# - they do not contain the username in plain text
-# - they are restricted to a specific resource path (example: '/vms/100')
-sub assemble_vnc_ticket {
-    my ($username, $path) = @_;
+my $assemble_short_lived_ticket = sub {
+    my ($prefix, $username, $path) = @_;
 
     my $rsa_priv = get_privkey();
 
@@ -459,11 +455,11 @@ sub assemble_vnc_ticket {
     my $secret_data = "$username:$path";
 
     return PVE::Ticket::assemble_rsa_ticket(
-	$rsa_priv, 'PVEVNC', undef, $secret_data);
-}
+	$rsa_priv, $prefix, undef, $secret_data);
+};
 
-sub verify_vnc_ticket {
-    my ($ticket, $username, $path, $noerr) = @_;
+my $verify_short_lived_ticket = sub {
+    my ($ticket, $prefix, $username, $path, $noerr) = @_;
 
     my $secret_data = "$username:$path";
 
@@ -473,12 +469,42 @@ sub verify_vnc_ticket {
 	    return undef;
 	} else {
 	    # raise error via undef ticket
-	    PVE::Ticket::verify_rsa_ticket($rsa_pub, 'PVEVNC');
+	    PVE::Ticket::verify_rsa_ticket($rsa_pub, $prefix);
 	}
     }
 
     return PVE::Ticket::verify_rsa_ticket(
-	$rsa_pub, 'PVEVNC', $ticket, $secret_data, -20, 40, $noerr);
+	$rsa_pub, $prefix, $ticket, $secret_data, -20, 40, $noerr);
+};
+
+# VNC tickets
+# - they do not contain the username in plain text
+# - they are restricted to a specific resource path (example: '/vms/100')
+sub assemble_vnc_ticket {
+    my ($username, $path) = @_;
+
+    return $assemble_short_lived_ticket->('PVEVNC', $username, $path);
+}
+
+sub verify_vnc_ticket {
+    my ($ticket, $username, $path, $noerr) = @_;
+
+    return $verify_short_lived_ticket->($ticket, 'PVEVNC', $username, $path, $noerr);
+}
+
+# Tunnel tickets
+# - they do not contain the username in plain text
+# - they are restricted to a specific resource path (example: '/vms/100', '/socket/run/qemu-server/123.storage')
+sub assemble_tunnel_ticket {
+    my ($username, $path) = @_;
+
+    return $assemble_short_lived_ticket->('PVETUNNEL', $username, $path);
+}
+
+sub verify_tunnel_ticket {
+    my ($ticket, $username, $path, $noerr) = @_;
+
+    return $verify_short_lived_ticket->($ticket, 'PVETUNNEL', $username, $path, $noerr);
 }
 
 sub assemble_spice_ticket {
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH access-control 2/2] ticket: normalize path for verification
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (4 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH access-control 1/2] tickets: add tunnel ticket Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH cluster 1/4] remote.cfg: add new config file Fabian Grünbichler
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 PVE/AccessControl.pm | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/PVE/AccessControl.pm b/PVE/AccessControl.pm
index 9d9a4bd..7949fde 100644
--- a/PVE/AccessControl.pm
+++ b/PVE/AccessControl.pm
@@ -461,6 +461,8 @@ my $assemble_short_lived_ticket = sub {
 my $verify_short_lived_ticket = sub {
     my ($ticket, $prefix, $username, $path, $noerr) = @_;
 
+    $path = normalize_path($path);
+
     my $secret_data = "$username:$path";
 
     my ($rsa_pub, $rsa_mtime) = get_pubkey();
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH cluster 1/4] remote.cfg: add new config file
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (5 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH access-control 2/2] ticket: normalize path for verification Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH cluster 2/4] add get_remote_info Fabian Grünbichler
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

with two section/entry types:

pve-cluster, referencing at least one node + an API token
pve-node, containing the connection information (address + optional
fingerprint)

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 data/PVE/Makefile                  |   2 +-
 data/src/status.c                  |   1 +
 data/PVE/Cluster.pm                |   1 +
 data/PVE/RemoteConfig.pm           | 226 +++++++++++++++++++++++++++++
 debian/libpve-cluster-perl.install |   1 +
 5 files changed, 230 insertions(+), 1 deletion(-)
 create mode 100644 data/PVE/RemoteConfig.pm

diff --git a/data/PVE/Makefile b/data/PVE/Makefile
index c8e0d2d..3c28597 100644
--- a/data/PVE/Makefile
+++ b/data/PVE/Makefile
@@ -11,7 +11,7 @@ PVE_VENDORARCH=${DESTDIR}/${PERL_VENDORARCH}/auto/PVE/IPCC
 PERL_DOC_INC_DIRS:=..
 
 SUBDIRS=Cluster CLI API2
-SOURCES=IPCC.pm Cluster.pm Corosync.pm RRD.pm DataCenterConfig.pm SSHInfo.pm
+SOURCES=IPCC.pm Cluster.pm Corosync.pm RRD.pm DataCenterConfig.pm SSHInfo.pm RemoteConfig.pm
 
 all:
 
diff --git a/data/src/status.c b/data/src/status.c
index 8e93221..40782d5 100644
--- a/data/src/status.c
+++ b/data/src/status.c
@@ -101,6 +101,7 @@ static memdb_change_t memdb_change_array[] = {
 	{ .path = "sdn/controllers.cfg" },
 	{ .path = "sdn/.version" },
 	{ .path = "virtual-guest/cpu-models.conf" },
+	{ .path = "remote.cfg" },
 };
 
 static GMutex mutex;
diff --git a/data/PVE/Cluster.pm b/data/PVE/Cluster.pm
index 210ea85..0f636af 100644
--- a/data/PVE/Cluster.pm
+++ b/data/PVE/Cluster.pm
@@ -71,6 +71,7 @@ my $observed = {
     'sdn/controllers.cfg' => 1,
     'sdn/.version' => 1,
     'virtual-guest/cpu-models.conf' => 1,
+    'remote.cfg' => 1,
 };
 
 sub prepare_observed_file_basedirs {
diff --git a/data/PVE/RemoteConfig.pm b/data/PVE/RemoteConfig.pm
new file mode 100644
index 0000000..23274de
--- /dev/null
+++ b/data/PVE/RemoteConfig.pm
@@ -0,0 +1,226 @@
+package PVE::RemoteConfig;
+
+use strict;
+use warnings;
+
+use PVE::Cluster qw(cfs_register_file cfs_read_file cfs_write_file cfs_lock_file);
+use PVE::JSONSchema qw(get_standard_option);
+use PVE::Tools;
+
+use PVE::SectionConfig;
+
+use base qw(PVE::SectionConfig);
+
+my $remote_cfg_filename = 'remote.cfg';
+
+cfs_register_file($remote_cfg_filename,
+		  sub { __PACKAGE__->parse_config(@_); },
+		  sub { __PACKAGE__->write_config(@_); });
+
+my $defaultData = {
+    propertyList => {
+	type => { description => "Remote type." },
+	id => get_standard_option('pve-node', {
+	    description => "Remote identifier.",
+	}),
+	comment => {
+	    description => "Description.",
+	    type => 'string',
+	    optional => 1,
+	    maxLength => 4096,
+	},
+    },
+};
+
+sub private {
+    return $defaultData;
+}
+
+sub parse_section_header {
+    my ($class, $line) = @_;
+
+    if ($line =~ m/^(\S+):\s*(\S+)\s*$/) {
+	my ($type, $id) = (lc($1), lc($2));
+	my $errmsg = undef; # set if you want to skip whole section
+	eval { PVE::JSONSchema::pve_verify_node_name($id); };
+	$errmsg = $@ if $@;
+	my $config = {};
+	return ($type, $id, $errmsg, $config);
+    }
+    return undef;
+}
+
+sub decode_value {
+    my ($class, $type, $key, $value) = @_;
+
+    my $def = $defaultData->{plugindata}->{$type};
+
+    if ($key eq 'nodes') {
+	my $res = {};
+
+	foreach my $node (PVE::Tools::split_list($value)) {
+	    if (PVE::JSONSchema::pve_verify_node_name($node)) {
+		$res->{$node} = 1;
+	    }
+	}
+
+	return $res;
+    }
+
+    return $value;
+}
+
+sub encode_value {
+    my ($class, $type, $key, $value) = @_;
+
+    if ($key eq 'nodes') {
+        return join(',', keys(%$value));
+    }
+
+    return $value;
+}
+
+sub parse_config {
+    my ($class, $filename, $raw) = @_;
+
+    my $cfg = $class->SUPER::parse_config($filename, $raw);
+
+    foreach my $id (sort keys %{$cfg->{ids}}) {
+	my $data = $cfg->{ids}->{$id};
+
+	if ($data->{type} eq 'cluster') {
+	    my $nodes = $data->{nodes};
+	    foreach my $node (keys %$nodes) {
+		my $node_data = $cfg->{ids}->{$node};
+		if (!defined($node_data)) {
+		    warn "Ignoring undefined remote node '$node' in remote cluster '$id'!\n";
+		    delete $nodes->{$node};
+		}
+	    }
+	}
+
+	$data->{comment} = PVE::Tools::decode_text($data->{comment})
+	    if defined($data->{comment});
+   }
+
+    return $cfg;
+}
+
+sub write_config {
+    my ($class, $filename, $cfg) = @_;
+
+    my $target_hash = {};
+
+    foreach my $id (keys %{$cfg->{ids}}) {
+	my $data = $cfg->{ids}->{$id};
+
+	if ($data->{type} eq 'cluster') {
+	    my $nodes = $data->{nodes};
+	    foreach my $node (keys %$nodes) {
+		my $node_data = $cfg->{ids}->{$node};
+		if (!defined($node_data)) {
+		    warn "Ignoring undefined remote node '$node' in remote cluster '$id'!\n";
+		    delete $nodes->{$node};
+		}
+	    }
+	}
+
+	$data->{comment} = PVE::Tools::encode_text($data->{comment})
+	    if defined($data->{comment});
+    }
+
+    return $class->SUPER::write_config($filename, $cfg);
+}
+
+sub new {
+    my ($type) = @_;
+
+    my $class = ref($type) || $type;
+
+    my $cfg = cfs_read_file($remote_cfg_filename);
+
+    return bless $cfg, $class;
+}
+
+sub write {
+    my ($cfg) = @_;
+
+    cfs_write_file($remote_cfg_filename, $cfg);
+}
+
+sub lock {
+    my ($code, $errmsg) = @_;
+
+    cfs_lock_file($remote_cfg_filename, undef, $code);
+    my $err = $@;
+    if ($err) {
+	$errmsg ? die "$errmsg: $err" : die $err;
+    }
+}
+
+package PVE::RemoteConfig::Cluster;
+
+use PVE::RemoteConfig;
+use base qw(PVE::RemoteConfig);
+
+sub type {
+    return 'pvecluster';
+}
+
+sub properties {
+    return {
+	nodes => {
+	    description => "Cluster nodes.",
+	    type => 'string', format => 'pve-node-list',
+	},
+	token => {
+	    description => "PVE API Token",
+	    type => 'string',
+	},
+    };
+}
+
+sub options {
+    return {
+	nodes => { optional => 0 },
+	comment => { optional => 1 },
+	token => { optional => 1 },
+    };
+}
+
+package PVE::RemoteConfig::Node;
+
+use PVE::JSONSchema qw(get_standard_option);
+
+use PVE::RemoteConfig;
+use base qw(PVE::RemoteConfig);
+
+sub type {
+    return 'pvenode';
+}
+
+sub properties {
+    return {
+	endpoint => {
+	    description => "Remote IP/FQDN.",
+	    type => 'string',
+	},
+	fingerprint => get_standard_option('fingerprint-sha256'),
+    };
+}
+
+sub options {
+    return {
+	endpoint => { optional => 0 },
+	fingerprint => { optional => 1 },
+	token => { optional => 1 },
+	comment => { optional => 1 },
+    };
+}
+
+
+PVE::RemoteConfig::Cluster->register();
+PVE::RemoteConfig::Node->register();
+PVE::RemoteConfig->init();
+
+1;
diff --git a/debian/libpve-cluster-perl.install b/debian/libpve-cluster-perl.install
index 51223f9..0610384 100644
--- a/debian/libpve-cluster-perl.install
+++ b/debian/libpve-cluster-perl.install
@@ -1,5 +1,6 @@
 usr/share/man/man5/datacenter.cfg.5
 usr/share/perl5/PVE/Corosync.pm
 usr/share/perl5/PVE/DataCenterConfig.pm
+usr/share/perl5/PVE/RemoteConfig.pm
 usr/share/perl5/PVE/RRD.pm
 usr/share/perl5/PVE/SSHInfo.pm
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH cluster 2/4] add get_remote_info
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (6 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH cluster 1/4] remote.cfg: add new config file Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-18 17:07   ` Thomas Lamprecht
  2021-04-13 12:16 ` [pve-devel] [PATCH cluster 3/4] remote: add option/completion Fabian Grünbichler
                   ` (14 subsequent siblings)
  22 siblings, 1 reply; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

as a unified helper for talking to a remote node. if the requested node
has an entry in the remote config, the information from that entry is
used.  else, the first locally defined node of the requested cluster is
used as proxy.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 data/PVE/RemoteConfig.pm | 55 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/data/PVE/RemoteConfig.pm b/data/PVE/RemoteConfig.pm
index 23274de..7c395ba 100644
--- a/data/PVE/RemoteConfig.pm
+++ b/data/PVE/RemoteConfig.pm
@@ -3,6 +3,7 @@ package PVE::RemoteConfig;
 use strict;
 use warnings;
 
+use PVE::APIClient::LWP;
 use PVE::Cluster qw(cfs_register_file cfs_read_file cfs_write_file cfs_lock_file);
 use PVE::JSONSchema qw(get_standard_option);
 use PVE::Tools;
@@ -158,6 +159,60 @@ sub lock {
     }
 }
 
+# will attempt to connect with node's locally defined endpoint if possible
+sub get_remote_info {
+    my ($self, $cluster, $node, $network_cidr) = @_;
+
+    my $cluster_info = $self->{ids}->{$cluster};
+    die "Remote cluster '$cluster' is not defined!\n"
+	if !defined($cluster_info) || $cluster_info->{type} ne 'pvecluster';
+
+    my $host = $node;
+
+    # fallback to random node/endpoint if $node is not locally defined
+    if (!$cluster_info->{nodes}->{$node}) {
+	my @defined_nodes = keys %{$cluster_info->{nodes}};
+	$host = $defined_nodes[0];
+    }
+
+    my $api_node = $self->{ids}->{$host};
+
+    my $api_token = $cluster_info->{token} // $api_node->{token};
+
+    my $conn_args = {
+	username => 'root@pam',
+	protocol => 'https',
+	host => $api_node->{endpoint},
+	apitoken => $api_token,
+	port => 8006,
+    };
+
+    if (my $fp = $api_node->{fingerprint}) {
+	$conn_args->{cached_fingerprints} = { uc($fp) => 1 };
+    } else {
+	# FIXME add proper parameter to APIClient
+	die "IMPLEMENT ME";
+	my $ssl_opts = {
+	    verify_hostname => 1,
+#	    SSL_ca_path => '/etc/ssl/certs',
+	    SSL_verify_callback => 1,
+	};
+    }
+
+    print "Establishing API connection with cluster '$cluster' node '$host'\n";
+
+    my $conn = PVE::APIClient::LWP->new(%$conn_args);
+
+
+    my $args = {};
+    $args->{cidr} = $network_cidr if $network_cidr;
+
+    print "Request IP information of node '$node'\n";
+    my $res = $conn->get("/nodes/$node/addr", $args);
+
+    return ($res, $conn_args);
+}
+
 package PVE::RemoteConfig::Cluster;
 
 use PVE::RemoteConfig;
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH cluster 3/4] remote: add option/completion
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (7 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH cluster 2/4] add get_remote_info Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH cluster 4/4] get_remote_info: also return FP if available Fabian Grünbichler
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 data/PVE/RemoteConfig.pm | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/data/PVE/RemoteConfig.pm b/data/PVE/RemoteConfig.pm
index 7c395ba..563e5c1 100644
--- a/data/PVE/RemoteConfig.pm
+++ b/data/PVE/RemoteConfig.pm
@@ -5,7 +5,7 @@ use warnings;
 
 use PVE::APIClient::LWP;
 use PVE::Cluster qw(cfs_register_file cfs_read_file cfs_write_file cfs_lock_file);
-use PVE::JSONSchema qw(get_standard_option);
+use PVE::JSONSchema qw(get_standard_option register_standard_option);
 use PVE::Tools;
 
 use PVE::SectionConfig;
@@ -33,6 +33,11 @@ my $defaultData = {
     },
 };
 
+register_standard_option('pve-remote-cluster', {
+    description => "A remote cluster identifier.",
+    type => 'string', format => 'pve-node',
+});
+
 sub private {
     return $defaultData;
 }
@@ -180,7 +185,6 @@ sub get_remote_info {
     my $api_token = $cluster_info->{token} // $api_node->{token};
 
     my $conn_args = {
-	username => 'root@pam',
 	protocol => 'https',
 	host => $api_node->{endpoint},
 	apitoken => $api_token,
@@ -213,6 +217,18 @@ sub get_remote_info {
     return ($res, $conn_args);
 }
 
+sub complete_remote_cluster {
+    my $conf = PVE::RemoteConfig->new();
+    my $sections = $conf->{ids};
+    return [ grep { $sections->{$_}->{type} eq 'pvecluster' } keys %$sections ];
+}
+
+sub complete_remote_node {
+    my $conf = PVE::RemoteConfig->new();
+    my $sections = $conf->{ids};
+    return [ grep { $sections->{$_}->{type} eq 'pvenode' } keys %$sections ];
+}
+
 package PVE::RemoteConfig::Cluster;
 
 use PVE::RemoteConfig;
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH cluster 4/4] get_remote_info: also return FP if available
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (8 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH cluster 3/4] remote: add option/completion Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH common 1/2] schema: pull out abstract 'id-pair' verifier Fabian Grünbichler
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 data/PVE/RemoteConfig.pm | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/data/PVE/RemoteConfig.pm b/data/PVE/RemoteConfig.pm
index 563e5c1..c4b8499 100644
--- a/data/PVE/RemoteConfig.pm
+++ b/data/PVE/RemoteConfig.pm
@@ -191,7 +191,8 @@ sub get_remote_info {
 	port => 8006,
     };
 
-    if (my $fp = $api_node->{fingerprint}) {
+    my $fp;
+    if ($fp = $api_node->{fingerprint}) {
 	$conn_args->{cached_fingerprints} = { uc($fp) => 1 };
     } else {
 	# FIXME add proper parameter to APIClient
@@ -207,14 +208,24 @@ sub get_remote_info {
 
     my $conn = PVE::APIClient::LWP->new(%$conn_args);
 
-
     my $args = {};
     $args->{cidr} = $network_cidr if $network_cidr;
 
-    print "Request IP information of node '$node'\n";
-    my $res = $conn->get("/nodes/$node/addr", $args);
+    print "Requesting IP information of node '$node'\n";
+    my $ips = $conn->get("/nodes/$node/addr", $args);
+
+    if ($host ne $node) {
+	print "Requesting certificate information of node '$node'\n";
+	my $cert_info = $conn->get("/nodes/$node/certificates/info");
+	foreach my $cert (@$cert_info) {
+	    $fp = $cert->{fingerprint} if $cert->{filename} ne 'pve-root-ca.pem';
+	    last if $cert->{filename} eq 'pveproxy-ssl.pem';
+	}
+    }
+
+    $fp = uc($fp) if $fp;
 
-    return ($res, $conn_args);
+    return ($ips, $fp, $conn_args);
 }
 
 sub complete_remote_cluster {
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH common 1/2] schema: pull out abstract 'id-pair' verifier
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (9 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH cluster 4/4] get_remote_info: also return FP if available Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-16 10:24   ` [pve-devel] applied: " Thomas Lamprecht
  2021-04-13 12:16 ` [pve-devel] [PATCH common 2/2] schema: add pve-bridge-id option/format/pair Fabian Grünbichler
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

we'll need another one for guest bridge IDs

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 src/PVE/JSONSchema.pm | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/src/PVE/JSONSchema.pm b/src/PVE/JSONSchema.pm
index 4864549..f2ddb50 100644
--- a/src/PVE/JSONSchema.pm
+++ b/src/PVE/JSONSchema.pm
@@ -273,20 +273,26 @@ sub parse_idmap {
     return $map;
 }
 
-register_format('storagepair', \&verify_storagepair);
-sub verify_storagepair {
-    my ($storagepair, $noerr) = @_;
+my $verify_idpair = sub {
+    my ($input, $noerr, $format) = @_;
 
-    # note: this only checks a single list entry
-    # when using a storagepair-list map, you need to pass the full
-    # parameter to parse_idmap
-    eval { parse_idmap($storagepair, 'pve-storage-id') };
+    eval { parse_idmap($input, $format) };
     if ($@) {
 	return undef if $noerr;
 	die "$@\n";
     }
 
-    return $storagepair;
+    return $input;
+};
+
+# note: this only checks a single list entry
+# when using a storagepair-list map, you need to pass the full parameter to
+# parse_idmap
+register_format('storagepair', \&verify_storagepair);
+sub verify_storagepair {
+    my ($storagepair, $noerr) = @_;
+    return $verify_idpair->($storagepair, $noerr, 'pve-storage-id');
+}
 }
 
 register_format('mac-addr', \&pve_verify_mac_addr);
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH common 2/2] schema: add pve-bridge-id option/format/pair
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (10 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH common 1/2] schema: pull out abstract 'id-pair' verifier Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-16  9:53   ` Thomas Lamprecht
  2021-04-13 12:16 ` [pve-devel] [PATCH guest-common] migrate: handle migration_network with remote migration Fabian Grünbichler
                   ` (10 subsequent siblings)
  22 siblings, 1 reply; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

for re-use in qemu-server/pve-container, which already have this option
duplicated. the '-pair' is needed for remote migration, but can also be
a nice addition to regular intra-cluster migration to lift the
restriction of having identically named bridges.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 src/PVE/JSONSchema.pm | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/src/PVE/JSONSchema.pm b/src/PVE/JSONSchema.pm
index f2ddb50..bf30b33 100644
--- a/src/PVE/JSONSchema.pm
+++ b/src/PVE/JSONSchema.pm
@@ -82,6 +82,12 @@ register_standard_option('pve-storage-id', {
     type => 'string', format => 'pve-storage-id',
 });
 
+register_standard_option('pve-bridge-id', {
+    description => "Bridge to attach guest network devices to.",
+    type => 'string', format => 'pve-bridge-id',
+    format_description => 'bridge',
+});
+
 register_standard_option('pve-config-digest', {
     description => 'Prevent changes if current configuration file has different SHA1 digest. This can be used to prevent concurrent modifications.',
     type => 'string',
@@ -193,6 +199,17 @@ sub parse_storage_id {
     return parse_id($storeid, 'storage', $noerr);
 }
 
+PVE::JSONSchema::register_format('pve-bridge-id', \&parse_bridge_id);
+sub parse_bridge_id {
+    my ($id, $noerr) = @_;
+
+    if ($id !~ m/^[-_.\w\d]+$/) {
+	return undef if $noerr;
+	die "invalid bridge ID '$id'\n";
+    }
+    return $id;
+}
+
 PVE::JSONSchema::register_format('acme-plugin-id', \&parse_acme_plugin_id);
 sub parse_acme_plugin_id {
     my ($pluginid, $noerr) = @_;
@@ -293,6 +310,14 @@ sub verify_storagepair {
     my ($storagepair, $noerr) = @_;
     return $verify_idpair->($storagepair, $noerr, 'pve-storage-id');
 }
+
+# note: this only checks a single list entry
+# when using a bridgepair-list map, you need to pass the full parameter to
+# parse_idmap
+register_format('bridgepair', \&verify_bridgepair);
+sub verify_bridgepair {
+    my ($bridgepair, $noerr) = @_;
+    return $verify_idpair->($bridgepair, $noerr, 'pve-bridge-id');
 }
 
 register_format('mac-addr', \&pve_verify_mac_addr);
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH guest-common] migrate: handle migration_network with remote migration
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (11 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH common 2/2] schema: add pve-bridge-id option/format/pair Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH manager] API: add node address(es) API endpoint Fabian Grünbichler
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

we only want to use an explicitly provided migration network, not one
for the local cluster.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 PVE/AbstractMigrate.pm | 51 +++++++++++++++++++++++++-----------------
 1 file changed, 31 insertions(+), 20 deletions(-)

diff --git a/PVE/AbstractMigrate.pm b/PVE/AbstractMigrate.pm
index af2be38..ec60b82 100644
--- a/PVE/AbstractMigrate.pm
+++ b/PVE/AbstractMigrate.pm
@@ -115,22 +115,27 @@ sub migrate {
 
     $class = ref($class) || $class;
 
-    my $dc_conf = PVE::Cluster::cfs_read_file('datacenter.cfg');
+    my ($ssh_info, $rem_ssh);
+    if (!$opts->{remote}) {
+	my $dc_conf = PVE::Cluster::cfs_read_file('datacenter.cfg');
 
-    my $migration_network = $opts->{migration_network};
-    if (!defined($migration_network)) {
-	$migration_network = $dc_conf->{migration}->{network};
-    }
-    my $ssh_info = PVE::SSHInfo::get_ssh_info($node, $migration_network);
-    $nodeip = $ssh_info->{ip};
-
-    my $migration_type = 'secure';
-    if (defined($opts->{migration_type})) {
-	$migration_type = $opts->{migration_type};
-    } elsif (defined($dc_conf->{migration}->{type})) {
-        $migration_type = $dc_conf->{migration}->{type};
+	my $migration_network = $opts->{migration_network};
+	if (!defined($migration_network)) {
+	    $migration_network = $dc_conf->{migration}->{network};
+	}
+	$ssh_info = PVE::SSHInfo::get_ssh_info($node, $migration_network);
+	$nodeip = $ssh_info->{ip};
+
+	my $migration_type = 'secure';
+	if (defined($opts->{migration_type})) {
+	    $migration_type = $opts->{migration_type};
+	} elsif (defined($dc_conf->{migration}->{type})) {
+	    $migration_type = $dc_conf->{migration}->{type};
+	}
+	$opts->{migration_type} = $migration_type;
+	$opts->{migration_network} = $migration_network;
+	$rem_ssh = PVE::SSHInfo::ssh_info_to_command($ssh_info);
     }
-    $opts->{migration_type} = $migration_type;
 
     my $self = {
 	delayed_interrupt => 0,
@@ -139,7 +144,7 @@ sub migrate {
 	node => $node,
 	ssh_info => $ssh_info,
 	nodeip => $nodeip,
-	rem_ssh => PVE::SSHInfo::ssh_info_to_command($ssh_info)
+	rem_ssh => $rem_ssh,
     };
 
     $self = bless $self, $class;
@@ -162,15 +167,21 @@ sub migrate {
 	&$eval_int($self, sub { $self->{running} = $self->prepare($self->{vmid}); });
 	die $@ if $@;
 
-	if (defined($migration_network)) {
+	if (defined($self->{opts}->{migration_network})) {
 	    $self->log('info', "use dedicated network address for sending " .
 	               "migration traffic ($self->{nodeip})");
 
 	    # test if we can connect to new IP
-	    my $cmd = [ @{$self->{rem_ssh}}, '/bin/true' ];
-	    eval { $self->cmd_quiet($cmd); };
-	    die "Can't connect to destination address ($self->{nodeip}) using " .
-	        "public key authentication\n" if $@;
+	    if ($self->{opts}->{remote}) {
+		eval { $self->{opts}->{remote}->{client}->get("/") };
+		die "Can't connect to destination address ($self->{nodeip}) using " .
+		    "API connection - $@\n" if $@;
+	    } else {
+		my $cmd = [ @{$self->{rem_ssh}}, '/bin/true' ];
+		eval { $self->cmd_quiet($cmd); };
+		die "Can't connect to destination address ($self->{nodeip}) using " .
+		    "public key authentication\n" if $@;
+	    }
 	}
 
 	&$eval_int($self, sub { $self->phase1($self->{vmid}); });
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH manager] API: add node address(es) API endpoint
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (12 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH guest-common] migrate: handle migration_network with remote migration Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-16 10:17   ` Thomas Lamprecht
  2021-04-13 12:16 ` [pve-devel] [PATCH storage] import: allow import from UNIX socket Fabian Grünbichler
                   ` (8 subsequent siblings)
  22 siblings, 1 reply; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 PVE/API2/Nodes.pm | 70 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/PVE/API2/Nodes.pm b/PVE/API2/Nodes.pm
index ba6621c6..b30d0739 100644
--- a/PVE/API2/Nodes.pm
+++ b/PVE/API2/Nodes.pm
@@ -222,6 +222,7 @@ __PACKAGE__->register_method ({
 	my ($param) = @_;
 
 	my $result = [
+	    { name => 'addr' },
 	    { name => 'aplinfo' },
 	    { name => 'apt' },
 	    { name => 'capabilities' },
@@ -2183,6 +2184,75 @@ __PACKAGE__->register_method ({
 	return undef;
     }});
 
+__PACKAGE__->register_method ({
+    name => 'get_node_addr',
+    path => 'addr',
+    method => 'GET',
+    proxyto => 'node',
+    permissions => {
+	check => ['perm', '/', [ 'Sys.Audit' ]],
+    },
+    description => "Get the content of /etc/hosts.",
+    parameters => {
+	additionalProperties => 0,
+	properties => {
+	    node => get_standard_option('pve-node'),
+	    cidr => {
+		type => 'string',
+		format => 'CIDR',
+		format_description => 'CIDR',
+		description => 'Extra network for which to retrieve local address(es).',
+		optional => 1,
+	    },
+	},
+    },
+    returns => {
+	type => 'object',
+	properties => {
+	    default => {
+		type => 'string',
+		description => 'Default node IP.',
+		format => 'ip',
+	    },
+	    migration => {
+		type => 'array',
+		items => {
+		    type => 'string',
+		    description => 'Migration network IP(s).',
+		    format => 'ip',
+		},
+		optional => 1,
+	    },
+	    extra => {
+		type => 'array',
+		items => {
+		    type => 'string',
+		    description => 'Extra network IP(s).',
+		    format => 'ip',
+		},
+		optional => 1,
+	    },
+	},
+    },
+    code => sub {
+	my ($param) = @_;
+
+	my $data = {};
+
+	my $default = PVE::Cluster::remote_node_ip($param->{node});
+
+	my $dc_conf = cfs_read_file('datacenter.cfg');
+	my $migration = $dc_conf->{migration}->{network};
+
+	$data->{default} = $default if defined($default);
+	$data->{migration} = PVE::Network::get_local_ip_from_cidr($migration, 1)
+	    if $migration;
+	$data->{extra} = PVE::Network::get_local_ip_from_cidr($param->{cidr}, 1)
+	    if $param->{cidr};
+
+	return $data;
+    }});
+
 # bash completion helper
 
 sub complete_templet_repo {
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH storage] import: allow import from UNIX socket
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (13 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH manager] API: add node address(es) API endpoint Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-16 10:24   ` [pve-devel] applied: " Thomas Lamprecht
  2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 1/7] migrate: factor out storage checks Fabian Grünbichler
                   ` (7 subsequent siblings)
  22 siblings, 1 reply; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

this allows forwarding over websockets without requiring a (free) port.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 PVE/CLI/pvesm.pm | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/PVE/CLI/pvesm.pm b/PVE/CLI/pvesm.pm
index 7b46897..9206188 100755
--- a/PVE/CLI/pvesm.pm
+++ b/PVE/CLI/pvesm.pm
@@ -7,6 +7,10 @@ use POSIX qw(O_RDONLY O_WRONLY O_CREAT O_TRUNC);
 use Fcntl ':flock';
 use File::Path;
 
+use IO::Socket::IP;
+use IO::Socket::UNIX;
+use Socket qw(SOCK_STREAM);
+
 use PVE::SafeSyslog;
 use PVE::Cluster;
 use PVE::INotify;
@@ -314,7 +318,8 @@ __PACKAGE__->register_method ({
 	    },
 	    filename => {
 		description => "Source file name. For '-' stdin is used, the " .
-		  "tcp://<IP-or-CIDR> format allows to use a TCP connection as input. " .
+		  "tcp://<IP-or-CIDR> format allows to use a TCP connection, " .
+		  "the unix://PATH-TO-SOCKET format a UNIX socket as input." .
 		  "Else, the file is treated as common file.",
 		type => 'string',
 	    },
@@ -392,6 +397,25 @@ __PACKAGE__->register_method ({
 	    alarm $prev_alarm;
 	    close($socket);
 
+	    $infh = \*$client;
+	} elsif ($filename =~ m!^unix://(.*)$!) {
+	    my $socket_path = $1;
+	    my $socket = IO::Socket::UNIX->new(
+		Type => SOCK_STREAM(),
+		Local => $socket_path,
+		Listen => 1,
+	    ) or die "failed to open socket: $!\n";
+
+	    print "ready\n";
+	    *STDOUT->flush();
+
+	    my $prev_alarm = alarm 0;
+	    local $SIG{ALRM} = sub { die "timed out waiting for client\n" };
+	    alarm 30;
+	    my $client = $socket->accept; # Wait for a client
+	    alarm $prev_alarm;
+	    close($socket);
+
 	    $infh = \*$client;
 	} else {
 	    sysopen($infh, $filename, O_RDONLY)
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH qemu-server 1/7] migrate: factor out storage checks
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (14 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH storage] import: allow import from UNIX socket Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 2/7] refactor map_storage to map_id Fabian Grünbichler
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

to re-use them for incoming remote migrations.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 PVE/API2/Qemu.pm | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index c56b609..a789456 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -133,6 +133,18 @@ my $check_storage_access_clone = sub {
    return $sharedvm;
 };
 
+my $check_storage_access_migrate = sub {
+    my ($rpcenv, $authuser, $storecfg, $storage, $node) = @_;
+
+    PVE::Storage::storage_check_node($storecfg, $storage, $node);
+
+    $rpcenv->check($authuser, "/storage/$storage", ['Datastore.AllocateSpace']);
+
+    my $scfg = PVE::Storage::storage_config($storecfg, $storage);
+    die "storage '$storage' does not support vm images\n"
+	if !$scfg->{content}->{images};
+};
+
 # Note: $pool is only needed when creating a VM, because pool permissions
 # are automatically inherited if VM already exists inside a pool.
 my $create_disks = sub {
@@ -3684,17 +3696,7 @@ __PACKAGE__->register_method({
 	}
 
 	my $storecfg = PVE::Storage::config();
-
 	if (my $targetstorage = $param->{targetstorage}) {
-	    my $check_storage = sub {
-		my ($target_sid) = @_;
-		PVE::Storage::storage_check_node($storecfg, $target_sid, $target);
-		$rpcenv->check($authuser, "/storage/$target_sid", ['Datastore.AllocateSpace']);
-		my $scfg = PVE::Storage::storage_config($storecfg, $target_sid);
-		raise_param_exc({ targetstorage => "storage '$target_sid' does not support vm images"})
-		    if !$scfg->{content}->{images};
-	    };
-
 	    my $storagemap = eval { PVE::JSONSchema::parse_idmap($targetstorage, 'pve-storage-id') };
 	    raise_param_exc({ targetstorage => "failed to parse storage map: $@" })
 		if $@;
@@ -3703,10 +3705,10 @@ __PACKAGE__->register_method({
 		if !defined($storagemap->{identity});
 
 	    foreach my $target_sid (values %{$storagemap->{entries}}) {
-		$check_storage->($target_sid);
+		$check_storage_access_migrate->($rpcenv, $authuser, $storecfg, $target_sid, $target);
 	    }
 
-	    $check_storage->($storagemap->{default})
+	    $check_storage_access_migrate->($rpcenv, $authuser, $storecfg, $storagemap->{default}, $target)
 		if $storagemap->{default};
 
 	    PVE::QemuServer::check_storage_availability($storecfg, $conf, $target)
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH qemu-server 2/7] refactor map_storage to map_id
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (15 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 1/7] migrate: factor out storage checks Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 3/7] schema: use pve-bridge-id Fabian Grünbichler
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

since we are going to reuse the same mechanism/code for network bridge
mapping.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 PVE/QemuMigrate.pm | 8 ++++----
 PVE/QemuServer.pm  | 6 ++++--
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index 5c019fc..eb95762 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -336,7 +336,7 @@ sub prepare {
 	my ($sid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
 
 	# check if storage is available on both nodes
-	my $targetsid = PVE::QemuServer::map_storage($self->{opts}->{storagemap}, $sid);
+	my $targetsid = PVE::QemuServer::map_id($self->{opts}->{storagemap}, $sid);
 
 	my $scfg = PVE::Storage::storage_check_node($self->{storecfg}, $sid);
 	PVE::Storage::storage_check_node($self->{storecfg}, $targetsid, $self->{node});
@@ -398,7 +398,7 @@ sub sync_disks {
 
 	    next if @{$dl->{$storeid}} == 0;
 
-	    my $targetsid = PVE::QemuServer::map_storage($self->{opts}->{storagemap}, $storeid);
+	    my $targetsid = PVE::QemuServer::map_id($self->{opts}->{storagemap}, $storeid);
 	    # check if storage is available on target node
 	    PVE::Storage::storage_check_node($storecfg, $targetsid, $self->{node});
 
@@ -451,7 +451,7 @@ sub sync_disks {
 
 	    my ($sid, $volname) = PVE::Storage::parse_volume_id($volid);
 
-	    my $targetsid = PVE::QemuServer::map_storage($self->{opts}->{storagemap}, $sid);
+	    my $targetsid = PVE::QemuServer::map_id($self->{opts}->{storagemap}, $sid);
 	    # check if storage is available on both nodes
 	    my $scfg = PVE::Storage::storage_check_node($storecfg, $sid);
 	    PVE::Storage::storage_check_node($storecfg, $targetsid, $self->{node});
@@ -612,7 +612,7 @@ sub sync_disks {
 
 	foreach my $volid (sort keys %$local_volumes) {
 	    my ($sid, $volname) = PVE::Storage::parse_volume_id($volid);
-	    my $targetsid = PVE::QemuServer::map_storage($self->{opts}->{storagemap}, $sid);
+	    my $targetsid = PVE::QemuServer::map_id($self->{opts}->{storagemap}, $sid);
 	    my $ref = $local_volumes->{$volid}->{ref};
 	    if ($self->{running} && $ref eq 'config') {
 		push @{$self->{online_local_volumes}}, $volid;
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index fdb2ac9..685a191 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -102,7 +102,9 @@ PVE::JSONSchema::register_standard_option('pve-qemu-machine', {
 });
 
 
-sub map_storage {
+# maps source to target ID
+# currently used for targetstorage and targetbridge when migrating
+sub map_id {
     my ($map, $source) = @_;
 
     return $source if !defined($map);
@@ -4976,7 +4978,7 @@ sub vm_migrate_alloc_nbd_disks {
 	# volume is not available there, fall back to the default format.
 	# Otherwise use the same format as the original.
 	if (!$storagemap->{identity}) {
-	    $storeid = map_storage($storagemap, $storeid);
+	    $storeid = map_id($storagemap, $storeid);
 	    my ($defFormat, $validFormats) = PVE::Storage::storage_default_format($storecfg, $storeid);
 	    my $scfg = PVE::Storage::storage_config($storecfg, $storeid);
 	    my $fileFormat = qemu_img_format($scfg, $volname);
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH qemu-server 3/7] schema: use pve-bridge-id
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (16 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 2/7] refactor map_storage to map_id Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 4/7] mtunnel: add API endpoints Fabian Grünbichler
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    requires pve-common with pve-bridge-id

 PVE/QemuServer.pm | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 685a191..d323d3d 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -864,13 +864,10 @@ my $net_fmt = {
         default_key => 1,
     },
     (map { $_ => { keyAlias => 'model', alias => 'macaddr' }} @$nic_model_list),
-    bridge => {
-	type => 'string',
+    bridge => get_standard_option('pve-bridge-id', {
 	description => $net_fmt_bridge_descr,
-	format_description => 'bridge',
-	pattern => '[-_.\w\d]+',
 	optional => 1,
-    },
+    }),
     queues => {
 	type => 'integer',
 	minimum => 0, maximum => 16,
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH qemu-server 4/7] mtunnel: add API endpoints
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (17 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 3/7] schema: use pve-bridge-id Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 5/7] migrate: refactor remote VM/tunnel start Fabian Grünbichler
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

the following two endpoints are used for migration on the remote side

POST /nodes/NODE/qemu/VMID/mtunnel

which creates and locks an empty VM config, and spawns the main qmtunnel
worker which binds to a VM-specific UNIX socket.

this worker handles JSON-encoded migration commands coming in via this
UNIX socket:
- config (set target VM config)
-- checks permissions for updating config
-- strips pending changes and snapshots
- disk (allocate disk for NBD migration)
-- checks permission for target storage
-- returns drive string for allocated volume
- disk-import (import 'pvesm export' stream for offline migration)
-- checks permission for target storage
-- forks a child running 'pvesm import' reading from a UNIX socket
-- only one import allowed to run at any given moment
- query-disk-import
-- checks output of 'pvesm import' for volume ID message
-- returns volid + success, or 'pending', or 'error'
- start (returning migration info)
- resume
- stop
- nbdstop
- unlock
- quit (+ cleanup)

this worker serves as a replacement for both 'qm mtunnel' and various
manual calls via SSH. the API call will return a ticket valid for
connecting to the worker's UNIX socket via a websocket connection.

GET+WebSocket upgrade /nodes/NODE/qemu/VMID/mtunnelwebsocket

gets called for connecting to a UNIX socket via websocket forwarding,
i.e. once for the main command mtunnel, and once each for the memory
migration and each NBD drive-mirror/storage migration.

access is guarded by a short-lived ticket binding the authenticated user
to the socket path. such tickets can be requested over the main mtunnel,
which keeps track of socket paths currently used by that
mtunnel/migration instance.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    requires
    - pve-storage with UNIX import support
    - pve-access-control with tunnel ticket support

 PVE/API2/Qemu.pm | 548 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 548 insertions(+)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index a789456..bf5ca14 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -6,8 +6,13 @@ use Cwd 'abs_path';
 use Net::SSLeay;
 use POSIX;
 use IO::Socket::IP;
+use IO::Socket::UNIX;
+use IPC::Open3;
+use JSON;
+use MIME::Base64;
 use URI::Escape;
 use Crypt::OpenSSL::Random;
+use Socket qw(SOCK_STREAM);
 
 use PVE::Cluster qw (cfs_read_file cfs_write_file);;
 use PVE::RRD;
@@ -848,6 +853,7 @@ __PACKAGE__->register_method({
 	    { subdir => 'spiceproxy' },
 	    { subdir => 'sendkey' },
 	    { subdir => 'firewall' },
+	    { subdir => 'mtunnel' },
 	    ];
 
 	return $res;
@@ -4397,4 +4403,546 @@ __PACKAGE__->register_method({
 	return PVE::QemuServer::Cloudinit::dump_cloudinit_config($conf, $param->{vmid}, $param->{type});
     }});
 
+__PACKAGE__->register_method({
+    name => 'mtunnel',
+    path => '{vmid}/mtunnel',
+    method => 'POST',
+    protected => 1,
+    proxyto => 'node',
+    description => 'Migration tunnel endpoint - only for internal use by VM migration.',
+    permissions => {
+	check => ['perm', '/vms/{vmid}', [ 'VM.Allocate' ]],
+	description => "You need 'VM.Allocate' permissions on /vms/{vmid}. Further permission checks happen during the actual migration.",
+    },
+    parameters => {
+	additionalProperties => 0,
+	properties => {
+	    node => get_standard_option('pve-node'),
+	    vmid => get_standard_option('pve-vmid'),
+	    storages => {
+		type => 'string',
+		format => 'pve-storage-id-list',
+		optional => 1,
+		description => 'List of storages to check permission and availability. Will be checked again for all actually used storages during migration.',
+	    },
+	},
+    },
+    returns => {
+	additionalProperties => 0,
+	properties => {
+	    upid => { type => 'string' },
+	    ticket => { type => 'string' },
+	    socket => { type => 'string' },
+	},
+    },
+    code => sub {
+	my ($param) = @_;
+
+	my $rpcenv = PVE::RPCEnvironment::get();
+	my $authuser = $rpcenv->get_user();
+
+	my $node = extract_param($param, 'node');
+	my $vmid = extract_param($param, 'vmid');
+
+	my $storages = extract_param($param, 'storages');
+
+	my $storecfg = PVE::Storage::config();
+	foreach my $storeid (PVE::Tools::split_list($storages)) {
+	    $check_storage_access_migrate->($rpcenv, $authuser, $storecfg, $storeid, $node);
+	}
+
+	PVE::Cluster::check_cfs_quorum();
+
+	my $socket_addr = "/run/qemu-server/$vmid.mtunnel";
+
+	my $lock = 'create';
+	eval { PVE::QemuConfig->create_and_lock_config($vmid, 0, $lock); };
+
+	raise_param_exc({ vmid => "unable to create empty VM config - $@"})
+	    if $@;
+
+	my $realcmd = sub {
+	    my $pveproxy_uid;
+
+	    my $state = {
+		storecfg => PVE::Storage::config(),
+		lock => $lock,
+	    };
+
+	    my $run_locked = sub {
+		my ($code, $params) = @_;
+		return PVE::QemuConfig->lock_config($vmid, sub {
+		    my $conf = PVE::QemuConfig->load_config($vmid);
+
+		    $state->{conf} = $conf;
+
+		    die "Encountered wrong lock - aborting mtunnel command handling.\n"
+			if $state->{lock} && !PVE::QemuConfig->has_lock($conf, $state->{lock});
+
+		    return $code->($params);
+		});
+	    };
+
+	    my $cmd_desc = {
+		config => {
+		    conf => {
+			type => 'object',
+			description => 'Full VM config, adapted for target cluster/node',
+		    },
+		},
+		disk => {
+		    format => PVE::JSONSchema::get_standard_option('pve-qm-image-format'),
+		    storage => {
+			type => 'string',
+			format => 'pve-storage-id',
+		    },
+		    drive => {
+			type => 'object',
+			description => 'parsed drive information without volid and format',
+		    },
+		},
+		'disk-import' => {
+		    volname => {
+			type => 'string',
+			description => 'volume name to use prefered target volume name',
+		    },
+		    format => PVE::JSONSchema::get_standard_option('pve-qm-image-format'),
+		    'export-formats' => {
+			type => 'string',
+			description => 'list of supported export formats',
+		    },
+		    storage => {
+			type => 'string',
+			format => 'pve-storage-id',
+		    },
+		    'with-snapshots' => {
+			description =>
+			    "Whether the stream includes intermediate snapshots",
+			type => 'boolean',
+			optional => 1,
+			default => 0,
+		    },
+		    'allow-rename' => {
+			description => "Choose a new volume ID if the requested " .
+			  "volume ID already exists, instead of throwing an error.",
+			type => 'boolean',
+			optional => 1,
+			default => 0,
+		    },
+		},
+		start => {
+		    start_params => {
+			type => 'object',
+			description => 'params passed to vm_start_nolock',
+		    },
+		    migrate_opts => {
+			type => 'object',
+			description => 'migrate_opts passed to vm_start_nolock',
+		    },
+		},
+		ticket => {
+		    path => {
+			type => 'string',
+			description => 'socket path for which the ticket should be valid. must be known to current mtunnel instance.',
+		    },
+		},
+		quit => {
+		    cleanup => {
+			type => 'boolean',
+			description => 'remove VM config and disks, aborting migration',
+			default => 0,
+		    },
+		},
+	    };
+
+	    my $cmd_handlers = {
+		'version' => sub {
+		    return {
+			tunnel => "2",
+		    };
+		},
+		'config' => sub {
+		    my ($params) = @_;
+
+		    PVE::QemuConfig->remove_lock($vmid, 'create');
+
+		    my $new_conf = $params->{conf};
+		    delete $new_conf->{lock};
+		    delete $new_conf->{digest};
+
+		    # TODO handle properly?
+		    delete $new_conf->{snapshots};
+		    delete $new_conf->{pending};
+
+		    my $vmgenid = delete $new_conf->{vmgenid};
+
+		    $new_conf->{vmid} = $vmid;
+		    $new_conf->{node} = $node;
+
+		    $update_vm_api->($new_conf, 1);
+
+		    my $conf = PVE::QemuConfig->load_config($vmid);
+		    $conf->{lock} = 'migrate';
+		    $conf->{vmgenid} = $vmgenid;
+		    PVE::QemuConfig->write_config($vmid, $conf);
+
+		    $state->{lock} = 'migrate';
+
+		    return;
+		},
+		'disk' => sub {
+		    my ($params) = @_;
+
+		    my $format = $params->{format};
+		    my $storeid = $params->{storage};
+		    my $drive = $params->{drive};
+
+		    $check_storage_access_migrate->($rpcenv, $authuser, $state->{storecfg}, $storeid, $node);
+
+		    my ($default_format, $valid_formats) = PVE::Storage::storage_default_format($state->{storecfg}, $storeid);
+		    my $scfg = PVE::Storage::storage_config($storecfg, $storeid);
+		    $format = $default_format
+			if !grep {$format eq $_} @{$valid_formats};
+
+		    my $size = int($drive->{size})/1024;
+		    my $newvolid = PVE::Storage::vdisk_alloc($state->{storecfg}, $storeid, $vmid, $format, undef, $size);
+
+		    my $newdrive = $drive;
+		    $newdrive->{format} = $format;
+		    $newdrive->{file} = $newvolid;
+		    my $drivestr = PVE::QemuServer::print_drive($newdrive);
+		    return {
+			drivestr => $drivestr,
+			volid => $newvolid,
+		    };
+		},
+		'disk-import' => sub {
+		    my ($params) = @_;
+
+		    die "disk import already running as PID '$state->{disk_import}->{pid}'\n"
+			if $state->{disk_import}->{pid};
+
+		    my $format = $params->{format};
+		    my $storeid = $params->{storage};
+		    $check_storage_access_migrate->($rpcenv, $authuser, $state->{storecfg}, $storeid, $node);
+
+		    my $with_snapshots = $params->{'with-snapshots'} ? 1 : 0;
+
+		    my ($default_format, $valid_formats) = PVE::Storage::storage_default_format($state->{storecfg}, $storeid);
+		    my $scfg = PVE::Storage::storage_config($storecfg, $storeid);
+		    die "unsupported format '$format' for storage '$storeid'\n"
+			if !grep {$format eq $_} @{$valid_formats};
+
+		    my $volname = $params->{volname};
+
+		    # get target volname, taken from PVE::Storage
+		    (my $name_without_extension = $volname) =~ s/\.$format$//;
+		    if ($scfg->{path}) {
+			$volname = "$vmid/$name_without_extension.$format";
+		    } else {
+			$volname = "$name_without_extension";
+		    }
+
+		    my $migration_snapshot;
+		    if ($scfg->{type} eq 'zfspool') {
+			$migration_snapshot = '__migration__';
+		    }
+
+		    my $volid = "$storeid:$volname";
+
+		    # find common import/export format, taken from PVE::Storage
+		    my @import_formats = PVE::Storage::volume_import_formats($state->{storecfg}, $volid, undef, $with_snapshots);
+		    my @export_formats = PVE::Tools::split_list($params->{'export-formats'});
+		    my %import_hash = map { $_ => 1 } @import_formats;
+		    my @common = grep { $import_hash{$_} } @export_formats;
+		    die "no matching import/export format found for storage '$storeid'\n"
+			if !@common;
+		    $format = $common[0];
+
+		    my $input = IO::File->new();
+		    my $info = IO::File->new();
+		    my $unix = "/run/qemu-server/$vmid.storage";
+
+		    my $import_cmd = ['pvesm', 'import', $volid, $format, "unix://$unix", '-with-snapshots', $with_snapshots];
+		    if ($params->{'allow-rename'}) {
+			push @$import_cmd, '-allow-rename', $params->{'allow-rename'};
+		    }
+		    if ($migration_snapshot) {
+			push @$import_cmd, '-delete-snapshot', $migration_snapshot;
+		    }
+
+		    unlink $unix;
+		    my $cpid = open3($input, $info, $info, @{$import_cmd})
+			or die "failed to spawn disk-import child - $!\n";
+
+		    $state->{disk_import}->{pid} = $cpid;
+		    my $ready;
+		    eval {
+			PVE::Tools::run_with_timeout(5, sub { $ready = <$info>; });
+		    };
+		    die "failed to read readyness from disk import child: $@\n" if $@;
+		    print "$ready\n";
+
+		    chown $pveproxy_uid, -1, $unix;
+
+		    $state->{disk_import}->{fh} = $info;
+		    $state->{disk_import}->{socket} = $unix;
+
+		    $state->{sockets}->{$unix} = 1;
+
+		    return {
+			socket => $unix,
+			format => $format,
+		    };
+		},
+		'query-disk-import' => sub {
+		    my ($params) = @_;
+
+		    die "no disk import running\n"
+			if !$state->{disk_import}->{pid};
+
+		    my $pattern = PVE::Storage::volume_imported_message(undef, 1);
+		    my $result;
+		    eval {
+			my $fh = $state->{disk_import}->{fh};
+			PVE::Tools::run_with_timeout(5, sub { $result = <$fh>; });
+			print "disk-import: $result\n" if $result;
+		    };
+		    if ($result && $result =~ $pattern) {
+			my $volid = $1;
+			waitpid($state->{disk_import}->{pid}, 0);
+
+			my $unix = $state->{disk_import}->{socket};
+			unlink $unix;
+			delete $state->{sockets}->{$unix};
+			delete $state->{disk_import};
+			return {
+			    status => "complete",
+			    volid => $volid,
+			};
+		    } elsif (!$result && waitpid($state->{disk_import}->{pid}, WNOHANG)) {
+			my $unix = $state->{disk_import}->{socket};
+			unlink $unix;
+			delete $state->{sockets}->{$unix};
+			delete $state->{disk_import};
+
+			return {
+			    status => "error",
+			};
+		    } else {
+			return {
+			    status => "pending",
+			};
+		    }
+		},
+		'start' => sub {
+		    my ($params) = @_;
+
+		    my $info = PVE::QemuServer::vm_start_nolock(
+			$state->{storecfg},
+			$vmid,
+			$state->{conf},
+			$params->{start_params},
+			$params->{migrate_opts},
+		    );
+
+
+		    if ($info->{migrate}->{proto} ne 'unix') {
+			PVE::QemuServer::vm_stop(undef, $vmid, 1, 1);
+			die "migration over non-UNIX sockets not possible\n";
+		    }
+
+		    my $socket = $info->{migrate}->{addr};
+		    chown $pveproxy_uid, -1, $socket;
+		    $state->{sockets}->{$socket} = 1;
+
+		    my $unix_sockets = $info->{migrate}->{unix_sockets};
+		    foreach my $socket (@$unix_sockets) {
+			chown $pveproxy_uid, -1, $socket;
+			$state->{sockets}->{$socket} = 1;
+		    }
+		    return $info;
+		},
+		'stop' => sub {
+		    PVE::QemuServer::vm_stop(undef, $vmid, 1, 1);
+		    return;
+		},
+		'nbdstop' => sub {
+		    PVE::QemuServer::nbd_stop($vmid);
+		    return;
+		},
+		'resume' => sub {
+		    if (PVE::QemuServer::check_running($vmid, 1)) {
+			PVE::QemuServer::vm_resume($vmid, 1, 1);
+		    } else {
+			die "VM $vmid not running\n";
+		    }
+		    return;
+		},
+		'unlock' => sub {
+		    PVE::QemuConfig->remove_lock($vmid, $state->{lock});
+		    delete $state->{lock};
+		    return;
+		},
+		'ticket' => sub {
+		    my ($params) = @_;
+
+		    my $path = $params->{path};
+
+		    die "Not allowed to generate ticket for unknown socket '$path'\n"
+			if !defined($state->{sockets}->{$path});
+
+		    return { ticket => PVE::AccessControl::assemble_tunnel_ticket($authuser, "/socket/$path") };
+		},
+		'quit' => sub {
+		    my ($params) = @_;
+
+		    PVE::QemuServer::destroy_vm($state->{storecfg}, $vmid, 1)
+			if $params->{cleanup};
+
+		    $state->{exit} = 1;
+		    return;
+		},
+	    };
+
+	    $run_locked->(sub {
+		my $socket_addr = "/run/qemu-server/$vmid.mtunnel";
+		unlink $socket_addr;
+
+		$state->{socket} = IO::Socket::UNIX->new(
+	            Type => SOCK_STREAM(),
+		    Local => $socket_addr,
+		    Listen => 1,
+		);
+
+		$pveproxy_uid = getpwnam('www-data')
+		    or die "Failed to resolve user 'www-data' to numeric UID\n";
+		chown $pveproxy_uid, -1, $socket_addr;
+	    });
+
+	    print "mtunnel started\n";
+
+	    my $conn = $state->{socket}->accept();
+
+	    $state->{conn} = $conn;
+
+	    my $reply_err = sub {
+		my ($msg) = @_;
+
+		my $reply = JSON::encode_json({
+		    success => JSON::false,
+		    msg => $msg,
+		});
+		$conn->print("$reply\n");
+		$conn->flush();
+	    };
+
+	    my $reply_ok = sub {
+		my ($res) = @_;
+
+		$res->{success} = JSON::true;
+		my $reply = JSON::encode_json($res);
+		$conn->print("$reply\n");
+		$conn->flush();
+	    };
+
+	    while (my $line = <$conn>) {
+		chomp $line;
+
+		# untaint, we validate below if needed
+		($line) = $line =~ /^(.*)$/;
+		print "command received: '$line'\n";
+		my $parsed = eval { JSON::decode_json($line) };
+		if ($@) {
+		    $reply_err->("failed to parse command - $@");
+		    next;
+		}
+
+		my $cmd = delete $parsed->{cmd};
+		if (!defined($cmd)) {
+		    $reply_err->("'cmd' missing");
+		} elsif (my $handler = $cmd_handlers->{$cmd}) {
+		    eval {
+			if ($cmd_desc->{$cmd}) {
+			    PVE::JSONSchema::validate($cmd_desc->{$cmd}, $parsed);
+			} else {
+			    $parsed = {};
+			}
+			my $res = $run_locked->($handler, $parsed);
+			$reply_ok->($res);
+		    };
+		    $reply_err->("failed to handle '$cmd' command - $@")
+			if $@;
+		} else {
+		    $reply_err->("unknown command '$cmd' given");
+		}
+
+		if ($state->{exit}) {
+		    $state->{conn}->close();
+		    $state->{socket}->close();
+		    last;
+		}
+	    }
+
+	    print "mtunnel exited\n";
+	};
+
+	my $ticket = PVE::AccessControl::assemble_tunnel_ticket($authuser, "/socket/$socket_addr");
+	my $upid = $rpcenv->fork_worker('qmtunnel', $vmid, $authuser, $realcmd);
+
+	return {
+	    ticket => $ticket,
+	    upid => $upid,
+	    socket => $socket_addr,
+	};
+    }});
+
+__PACKAGE__->register_method({
+    name => 'mtunnelwebsocket',
+    path => '{vmid}/mtunnelwebsocket',
+    method => 'GET',
+    proxyto => 'node',
+    permissions => {
+	description => "You need to pass a ticket valid for the selected socket. Tickets can be created via the mtunnel API call, which will check permissions accordingly.",
+        user => 'all', # check inside
+    },
+    description => 'Migration tunnel endpoint for websocket upgrade - only for internal use by VM migration.',
+    parameters => {
+	additionalProperties => 0,
+	properties => {
+	    node => get_standard_option('pve-node'),
+	    vmid => get_standard_option('pve-vmid'),
+	    socket => {
+		type => "string",
+		description => "unix socket to forward to",
+	    },
+	    ticket => {
+		type => "string",
+		description => "ticket return by initial 'mtunnel' API call, or retrieved via 'ticket' tunnel command",
+	    },
+	},
+    },
+    returns => {
+	type => "object",
+	properties => {
+	    port => { type => 'string', optional => 1 },
+	    socket => { type => 'string', optional => 1 },
+	},
+    },
+    code => sub {
+	my ($param) = @_;
+
+	my $rpcenv = PVE::RPCEnvironment::get();
+	my $authuser = $rpcenv->get_user();
+
+	my $vmid = $param->{vmid};
+	# check VM exists
+	PVE::QemuConfig->load_config($vmid);
+
+	my $socket = $param->{socket};
+	PVE::AccessControl::verify_tunnel_ticket($param->{ticket}, $authuser, "/socket/$socket");
+
+	return { socket => $socket };
+    }});
+
 1;
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH qemu-server 5/7] migrate: refactor remote VM/tunnel start
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (18 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 4/7] mtunnel: add API endpoints Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 6/7] migrate: add remote migration handling Fabian Grünbichler
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

no semantic changes intended, except for:
- no longer passing the main migration UNIX socket to SSH twice for
forwarding
- dropping the 'unix:' prefix in start_remote_tunnel's timeout error message

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---
 PVE/QemuMigrate.pm | 154 +++++++++++++++++++++++++++------------------
 PVE/QemuServer.pm  |  32 +++++-----
 2 files changed, 110 insertions(+), 76 deletions(-)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index eb95762..5d44c51 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -204,19 +204,24 @@ sub finish_tunnel {
     die $err if $err;
 }
 
+# tunnel_info:
+#   proto: unix (secure) or tcp (insecure/legacy compat)
+#   addr: IP or UNIX socket path
+#   port: optional TCP port
+#   unix_sockets: additional UNIX socket paths to forward
 sub start_remote_tunnel {
-    my ($self, $raddr, $rport, $ruri, $unix_socket_info) = @_;
+    my ($self, $tunnel_info) = @_;
 
     my $nodename = PVE::INotify::nodename();
     my $migration_type = $self->{opts}->{migration_type};
 
     if ($migration_type eq 'secure') {
 
-	if ($ruri =~ /^unix:/) {
-	    my $ssh_forward_info = ["$raddr:$raddr"];
-	    $unix_socket_info->{$raddr} = 1;
+	if ($tunnel_info->{proto} eq 'unix') {
+	    my $ssh_forward_info = [];
 
-	    my $unix_sockets = [ keys %$unix_socket_info ];
+	    my $unix_sockets = [ keys %{$tunnel_info->{unix_sockets}} ];
+	    push @$unix_sockets, $tunnel_info->{addr};
 	    for my $sock (@$unix_sockets) {
 		push @$ssh_forward_info, "$sock:$sock";
 		unlink $sock;
@@ -243,23 +248,23 @@ sub start_remote_tunnel {
 	    if ($unix_socket_try > 100) {
 		$self->{errors} = 1;
 		$self->finish_tunnel($self->{tunnel});
-		die "Timeout, migration socket $ruri did not get ready";
+		die "Timeout, migration socket $tunnel_info->{addr} did not get ready";
 	    }
 	    $self->{tunnel}->{unix_sockets} = $unix_sockets if (@$unix_sockets);
 
-	} elsif ($ruri =~ /^tcp:/) {
+	} elsif ($tunnel_info->{proto} eq 'tcp') {
 	    my $ssh_forward_info = [];
-	    if ($raddr eq "localhost") {
+	    if ($tunnel_info->{addr} eq "localhost") {
 		# for backwards compatibility with older qemu-server versions
 		my $pfamily = PVE::Tools::get_host_address_family($nodename);
 		my $lport = PVE::Tools::next_migrate_port($pfamily);
-		push @$ssh_forward_info, "$lport:localhost:$rport";
+		push @$ssh_forward_info, "$lport:localhost:$tunnel_info->{rporyt}";
 	    }
 
 	    $self->{tunnel} = $self->fork_tunnel($ssh_forward_info);
 
 	} else {
-	    die "unsupported protocol in migration URI: $ruri\n";
+	    die "unsupported protocol in migration URI: $tunnel_info->{proto}\n";
 	}
     } else {
 	#fork tunnel for insecure migration, to send faster commands like resume
@@ -737,48 +742,36 @@ sub phase1_cleanup {
 
 }
 
-sub phase2 {
-    my ($self, $vmid) = @_;
+sub phase2_start_local_cluster {
+    my ($self, $vmid, $params) = @_;
 
     my $conf = $self->{vmconf};
+    my $start = $params->{start_params};
+    my $migrate = $params->{migrate_opts};
 
     $self->log('info', "starting VM $vmid on remote node '$self->{node}'");
 
-    my $raddr;
-    my $rport;
-    my $ruri; # the whole migration dst. URI (protocol:address[:port])
-    my $nodename = PVE::INotify::nodename();
+    my $tunnel_info = {};
 
     ## start on remote node
     my $cmd = [@{$self->{rem_ssh}}];
 
-    my $spice_ticket;
-    if (PVE::QemuServer::vga_conf_has_spice($conf->{vga})) {
-	my $res = mon_cmd($vmid, 'query-spice');
-	$spice_ticket = $res->{ticket};
-    }
+    push @$cmd, 'qm', 'start', $vmid, '--skiplock';
+    push @$cmd, '--migratedfrom', $migrate->{migratedfrom};
 
-    push @$cmd , 'qm', 'start', $vmid, '--skiplock', '--migratedfrom', $nodename;
+    push @$cmd, '--migration_type', $migrate->{type};
 
-    my $migration_type = $self->{opts}->{migration_type};
+    push @$cmd, '--migration_network', $migrate->{network}
+      if $migrate->{migration_network};
 
-    push @$cmd, '--migration_type', $migration_type;
+    push @$cmd, '--stateuri', $start->{statefile};
 
-    push @$cmd, '--migration_network', $self->{opts}->{migration_network}
-      if $self->{opts}->{migration_network};
-
-    if ($migration_type eq 'insecure') {
-	push @$cmd, '--stateuri', 'tcp';
-    } else {
-	push @$cmd, '--stateuri', 'unix';
+    if ($start->{forcemachine}) {
+	push @$cmd, '--machine', $start->{forcemachine};
     }
 
-    if ($self->{forcemachine}) {
-	push @$cmd, '--machine', $self->{forcemachine};
-    }
-
-    if ($self->{forcecpu}) {
-	push @$cmd, '--force-cpu', $self->{forcecpu};
+    if ($start->{forcecpu}) {
+	push @$cmd, '--force-cpu', $start->{forcecpu};
     }
 
     if ($self->{online_local_volumes}) {
@@ -786,12 +779,9 @@ sub phase2 {
     }
 
     my $spice_port;
-    my $unix_socket_info = {};
-    # version > 0 for unix socket support
-    my $nbd_protocol_version = 1;
     # TODO change to 'spice_ticket: <ticket>\n' in 7.0
-    my $input = $spice_ticket ? "$spice_ticket\n" : "\n";
-    $input .= "nbd_protocol_version: $nbd_protocol_version\n";
+    my $input = $migrate->{spice_ticket} ? "$migrate->{spice_ticket}\n" : "\n";
+    $input .= "nbd_protocol_version: $migrate->{nbd_proto_version}\n";
 
     my $number_of_online_replicated_volumes = 0;
 
@@ -811,20 +801,20 @@ sub phase2 {
     my $exitcode = PVE::Tools::run_command($cmd, input => $input, outfunc => sub {
 	my $line = shift;
 
-	if ($line =~ m/^migration listens on tcp:(localhost|[\d\.]+|\[[\d\.:a-fA-F]+\]):(\d+)$/) {
-	    $raddr = $1;
-	    $rport = int($2);
-	    $ruri = "tcp:$raddr:$rport";
+	if ($line =~ m/^migration listens on (tcp):(localhost|[\d\.]+|\[[\d\.:a-fA-F]+\]):(\d+)$/) {
+	    $tunnel_info->{addr} = $2;
+	    $tunnel_info->{port} = int($3);
+	    $tunnel_info->{proto} = $1;
 	}
-	elsif ($line =~ m!^migration listens on unix:(/run/qemu-server/(\d+)\.migrate)$!) {
-	    $raddr = $1;
-	    die "Destination UNIX sockets VMID does not match source VMID" if $vmid ne $2;
-	    $ruri = "unix:$raddr";
+	elsif ($line =~ m!^migration listens on (unix):(/run/qemu-server/(\d+)\.migrate)$!) {
+	    $tunnel_info->{addr} = $2;
+	    die "Destination UNIX sockets VMID does not match source VMID" if $vmid ne $3;
+	    $tunnel_info->{proto} = $1;
 	}
 	elsif ($line =~ m/^migration listens on port (\d+)$/) {
-	    $raddr = "localhost";
-	    $rport = int($1);
-	    $ruri = "tcp:$raddr:$rport";
+	    $tunnel_info->{addr} = "localhost";
+	    $tunnel_info->{port} = int($1);
+	    $tunnel_info->{proto} = "tcp";
 	}
 	elsif ($line =~ m/^spice listens on port (\d+)$/) {
 	    $spice_port = int($1);
@@ -849,7 +839,7 @@ sub phase2 {
 	    $self->{stopnbd} = 1;
 	    $self->{target_drive}->{$targetdrive}->{drivestr} = $drivestr;
 	    $self->{target_drive}->{$targetdrive}->{nbd_uri} = $nbd_uri;
-	    $unix_socket_info->{$nbd_unix_addr} = 1;
+	    $tunnel_info->{unix_sockets}->{$nbd_unix_addr} = 1;
 	} elsif ($line =~ m/^re-using replicated volume: (\S+) - (.*)$/) {
 	    my $drive = $1;
 	    my $volid = $2;
@@ -864,14 +854,58 @@ sub phase2 {
 
     die "remote command failed with exit code $exitcode\n" if $exitcode;
 
-    die "unable to detect remote migration address\n" if !$raddr;
+    die "unable to detect remote migration address\n" if !$tunnel_info->{addr} || !$tunnel_info->{proto};
 
     if (scalar(keys %$target_replicated_volumes) != $number_of_online_replicated_volumes) {
 	die "number of replicated disks on source and target node do not match - target node too old?\n"
     }
 
+    return ($tunnel_info, $spice_port);
+}
+
+sub phase2 {
+    my ($self, $vmid) = @_;
+
+    my $conf = $self->{vmconf};
+
+    # version > 0 for unix socket support
+    my $nbd_protocol_version = 1;
+
+    my $spice_ticket;
+    if (PVE::QemuServer::vga_conf_has_spice($conf->{vga})) {
+	my $res = mon_cmd($vmid, 'query-spice');
+	$spice_ticket = $res->{ticket};
+    }
+
+    my $migration_type = $self->{opts}->{migration_type};
+    my $state_uri = $migration_type eq 'insecure' ? 'tcp' : 'unix';
+
+    my $params = {
+	start_params => {
+	    statefile => $state_uri,
+	    forcemachine => $self->{forcemachine},
+	    forcecpu => $self->{forcecpu},
+	    skiplock => 1,
+	},
+	migrate_opts => {
+	    spice_ticket => $spice_ticket,
+	    type => $migration_type,
+	    network => $self->{opts}->{migration_network},
+	    storagemap => $self->{opts}->{storagemap},
+	    migratedfrom => PVE::INotify::nodename(),
+	    nbd_proto_version => $nbd_protocol_version,
+	    nbd => $self->{nbd},
+	},
+    };
+
+    my ($tunnel_info, $spice_port) = $self->phase2_start_local_cluster($vmid, $params);
+
     $self->log('info', "start remote tunnel");
-    $self->start_remote_tunnel($raddr, $rport, $ruri, $unix_socket_info);
+    $self->start_remote_tunnel($tunnel_info);
+
+    my $migrate_uri = "$tunnel_info->{proto}:$tunnel_info->{addr}";
+    $migrate_uri .= ":$tunnel_info->{port}"
+	if defined($tunnel_info->{port});
 
     my $start = time();
 
@@ -908,7 +942,7 @@ sub phase2 {
 	}
     }
 
-    $self->log('info', "starting online/live migration on $ruri");
+    $self->log('info', "starting online/live migration on $migrate_uri");
     $self->{livemigration} = 1;
 
     # load_defaults
@@ -981,12 +1015,12 @@ sub phase2 {
 
     }
 
-    $self->log('info', "start migrate command to $ruri");
+    $self->log('info', "start migrate command to $migrate_uri");
     eval {
-	mon_cmd($vmid, "migrate", uri => $ruri);
+	mon_cmd($vmid, "migrate", uri => $migrate_uri);
     };
     my $merr = $@;
-    $self->log('info', "migrate uri => $ruri failed: $merr") if $merr;
+    $self->log('info', "migrate uri => $migrate_uri failed: $merr") if $merr;
 
     my $lstat = 0;
     my $usleep = 1000000;
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index d323d3d..a131fc8 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -5141,10 +5141,10 @@ sub vm_start_nolock {
 	return $migration_ip;
     };
 
-    my $migrate_uri;
     if ($statefile) {
 	if ($statefile eq 'tcp') {
-	    my $localip = "localhost";
+	    my $migrate = $res->{migrate} = { proto => 'tcp' };
+	    $migrate->{addr} = "localhost";
 	    my $datacenterconf = PVE::Cluster::cfs_read_file('datacenter.cfg');
 	    my $nodename = nodename();
 
@@ -5157,26 +5157,26 @@ sub vm_start_nolock {
 	    }
 
 	    if ($migration_type eq 'insecure') {
-		$localip = $get_migration_ip->($nodename);
-		$localip = "[$localip]" if Net::IP::ip_is_ipv6($localip);
+		$migrate->{addr} = $get_migration_ip->($nodename);
+		$migrate->{addr} = "[$migrate->{addr}]" if Net::IP::ip_is_ipv6($migrate->{addr});
 	    }
 
 	    my $pfamily = PVE::Tools::get_host_address_family($nodename);
-	    my $migrate_port = PVE::Tools::next_migrate_port($pfamily);
-	    $migrate_uri = "tcp:${localip}:${migrate_port}";
-	    push @$cmd, '-incoming', $migrate_uri;
+	    $migrate->{port} = PVE::Tools::next_migrate_port($pfamily);
+	    $migrate->{uri} = "tcp:$migrate->{addr}:$migrate->{port}";
+	    push @$cmd, '-incoming', $migrate->{uri};
 	    push @$cmd, '-S';
 
 	} elsif ($statefile eq 'unix') {
 	    # should be default for secure migrations as a ssh TCP forward
 	    # tunnel is not deterministic reliable ready and fails regurarly
 	    # to set up in time, so use UNIX socket forwards
-	    my $socket_addr = "/run/qemu-server/$vmid.migrate";
-	    unlink $socket_addr;
+	    my $migrate = $res->{migrate} = { proto => 'unix' };
+	    $migrate->{addr} = "/run/qemu-server/$vmid.migrate";
+	    unlink $migrate->{addr};
 
-	    $migrate_uri = "unix:$socket_addr";
-
-	    push @$cmd, '-incoming', $migrate_uri;
+	    $migrate->{uri} = "unix:$migrate->{addr}";
+	    push @$cmd, '-incoming', $migrate->{uri};
 	    push @$cmd, '-S';
 
 	} elsif (-e $statefile) {
@@ -5297,10 +5297,9 @@ sub vm_start_nolock {
 	die "start failed: $err";
     }
 
-    print "migration listens on $migrate_uri\n" if $migrate_uri;
-    $res->{migrate_uri} = $migrate_uri;
-
-    if ($statefile && $statefile ne 'tcp' && $statefile ne 'unix')  {
+    if (defined($res->{migrate})) {
+	print "migration listens on $res->{migrate}->{uri}\n";
+    } elsif ($statefile) {
 	eval { mon_cmd($vmid, "cont"); };
 	warn $@ if $@;
     }
@@ -5315,6 +5314,7 @@ sub vm_start_nolock {
 	    my $socket_path = "/run/qemu-server/$vmid\_nbd.migrate";
 	    mon_cmd($vmid, "nbd-server-start", addr => { type => 'unix', data => { path => $socket_path } } );
 	    $migrate_storage_uri = "nbd:unix:$socket_path";
+	    $res->{migrate}->{unix_sockets} = [$socket_path];
 	} else {
 	    my $nodename = nodename();
 	    my $localip = $get_migration_ip->($nodename);
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH qemu-server 6/7] migrate: add remote migration handling
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (19 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 5/7] migrate: refactor remote VM/tunnel start Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 7/7] api: add remote migrate endpoint Fabian Grünbichler
  2021-04-15 14:04 ` [pve-devel] [RFC qemu-server++ 0/22] remote migration alexandre derumier
  22 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

remote migration uses a websocket connection to a task worker running on
the target node instead of commands via SSH to control the migration.
this websocket tunnel is started earlier than the SSH tunnel, and allows
adding UNIX-socket forwarding over additional websocket connections
on-demand.

the main differences to regular intra-cluster migration are:
- source VM config and disks are not removed upon migration
- shared storages are treated like local storages, since we can't
assume they are shared across clusters
- NBD migrated disks are explicitly pre-allocated on the target node via
tunnel command before starting the target VM instance
- in addition to storages, network bridges and the VMID itself is
transformed via a user defined mapping

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    requires
    - mtunnel endpoints on remote node
    - proxmox-websocket-tunnel on local node

 PVE/QemuMigrate.pm | 592 +++++++++++++++++++++++++++++++++++++++------
 PVE/QemuServer.pm  |   8 +-
 2 files changed, 521 insertions(+), 79 deletions(-)

diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm
index 5d44c51..e1053d0 100644
--- a/PVE/QemuMigrate.pm
+++ b/PVE/QemuMigrate.pm
@@ -7,7 +7,13 @@ use IO::File;
 use IPC::Open2;
 use POSIX qw( WNOHANG );
 use Time::HiRes qw( usleep );
+use JSON qw(encode_json decode_json);
+use IO::Socket::UNIX;
+use Socket qw(SOCK_STREAM);
+use Storable qw(dclone);
+use URI::Escape;
 
+use PVE::APIClient::LWP;
 use PVE::Cluster;
 use PVE::INotify;
 use PVE::RPCEnvironment;
@@ -83,7 +89,7 @@ sub finish_command_pipe {
 	}
     }
 
-    $self->log('info', "ssh tunnel still running - terminating now with SIGTERM\n");
+    $self->log('info', "tunnel still running - terminating now with SIGTERM\n");
     kill(15, $cpid);
 
     # wait again
@@ -92,11 +98,11 @@ sub finish_command_pipe {
 	sleep(1);
     }
 
-    $self->log('info', "ssh tunnel still running - terminating now with SIGKILL\n");
+    $self->log('info', "tunnel still running - terminating now with SIGKILL\n");
     kill 9, $cpid;
     sleep 1;
 
-    $self->log('err', "ssh tunnel child process (PID $cpid) couldn't be collected\n")
+    $self->log('err', "tunnel child process (PID $cpid) couldn't be collected\n")
 	if !&$collect_child_process();
 }
 
@@ -113,18 +119,28 @@ sub read_tunnel {
     };
     die "reading from tunnel failed: $@\n" if $@;
 
-    chomp $output;
+    chomp $output if defined($output);
 
     return $output;
 }
 
 sub write_tunnel {
-    my ($self, $tunnel, $timeout, $command) = @_;
+    my ($self, $tunnel, $timeout, $command, $params) = @_;
 
     $timeout = 60 if !defined($timeout);
 
     my $writer = $tunnel->{writer};
 
+    if ($tunnel->{version} && $tunnel->{version} >= 2) {
+	my $object = defined($params) ? dclone($params) : {};
+	$object->{cmd} = $command;
+
+	$command = eval { JSON::encode_json($object) };
+
+	die "failed to encode command as JSON - $@\n"
+	    if $@;
+    }
+
     eval {
 	PVE::Tools::run_with_timeout($timeout, sub {
 	    print $writer "$command\n";
@@ -134,13 +150,29 @@ sub write_tunnel {
     die "writing to tunnel failed: $@\n" if $@;
 
     if ($tunnel->{version} && $tunnel->{version} >= 1) {
-	my $res = eval { $self->read_tunnel($tunnel, 10); };
+	my $res = eval { $self->read_tunnel($tunnel, $timeout); };
 	die "no reply to command '$command': $@\n" if $@;
 
-	if ($res eq 'OK') {
-	    return;
+	if ($tunnel->{version} == 1) {
+	    if ($res eq 'OK') {
+		return;
+	    } else {
+		die "tunnel replied '$res' to command '$command'\n";
+	    }
 	} else {
-	    die "tunnel replied '$res' to command '$command'\n";
+	    my $parsed = eval { JSON::decode_json($res) };
+	    die "failed to decode tunnel reply '$res' (command '$command') - $@\n"
+		if $@;
+
+	    if (!$parsed->{success}) {
+		if (defined($parsed->{msg})) {
+		    die "error - tunnel command '$command' failed - $parsed->{msg}\n";
+		} else {
+		    die "error - tunnel command '$command' failed\n";
+		}
+	    }
+
+	    return $parsed;
 	}
     }
 }
@@ -183,10 +215,149 @@ sub fork_tunnel {
     return $tunnel;
 }
 
+my $forward_unix_socket = sub {
+    my ($self, $local, $remote) = @_;
+
+    my $params = dclone($self->{tunnel}->{params});
+    $params->{unix} = $local;
+    $params->{url} = $params->{url} ."socket=$remote&";
+    $params->{ticket} = { path => $remote };
+
+    my $cmd = encode_json({
+	control => JSON::true,
+	cmd => 'forward',
+	data => $params,
+    });
+
+    my $writer = $self->{tunnel}->{writer};
+    eval {
+	unlink $local;
+	PVE::Tools::run_with_timeout(15, sub {
+	    print $writer "$cmd\n";
+	    $writer->flush();
+	});
+    };
+    die "failed to write forwarding command - $@\n" if $@;
+
+    $self->read_tunnel($self->{tunnel});
+
+    $self->log('info', "Forwarded local unix socket '$local' to remote '$remote' via websocket tunnel");
+};
+
+sub fork_websocket_tunnel {
+    my ($self, $storages) = @_;
+
+    my $remote = $self->{opts}->{remote};
+    my $conn = $remote->{conn};
+
+    my $websocket_url = "https://$conn->{host}:$conn->{port}/api2/json/nodes/$self->{node}/qemu/$remote->{vmid}/mtunnelwebsocket";
+
+    my $params = {
+	url => $websocket_url,
+    };
+
+    if (my $apitoken = $conn->{apitoken}) {
+	$params->{headers} = [["Authorization", "$apitoken"]];
+    } else {
+	die "can't connect to remote host without credentials\n";
+    }
+
+    if (my $fps = $conn->{cached_fingerprints}) {
+	$params->{fingerprint} = (keys %$fps)[0];
+    }
+
+    my $api_client = PVE::APIClient::LWP->new(%$conn);
+    my $storage_list = join(',', keys %$storages);
+    my $res = $api_client->post("/nodes/$self->{node}/qemu/$remote->{vmid}/mtunnel", { storages => $storage_list });
+    $self->log('info', "remote: started migration tunnel worker '$res->{upid}'");
+    $params->{url} .= "?ticket=".uri_escape($res->{ticket});
+    $params->{url} .= "&socket=$res->{socket}";
+
+    my $reader = IO::Pipe->new();
+    my $writer = IO::Pipe->new();
+
+    my $cpid = fork();
+    if ($cpid) {
+	$writer->writer();
+	$reader->reader();
+	my $tunnel = { writer => $writer, reader => $reader, pid => $cpid };
+
+	eval {
+	    my $writer = $tunnel->{writer};
+	    my $cmd = encode_json({
+		control => JSON::true,
+		cmd => 'connect',
+		data => $params,
+	    });
+
+	    eval {
+		PVE::Tools::run_with_timeout(15, sub {
+		    print {$writer} "$cmd\n";
+		    $writer->flush();
+		});
+	    };
+	    die "failed to write tunnel connect command - $@\n" if $@;
+	};
+	die "failed to connect via WS: $@\n" if $@;
+
+	my $err;
+        eval {
+	    my $writer = $tunnel->{writer};
+	    my $cmd = encode_json({
+		cmd => 'version',
+	    });
+
+	    eval {
+		PVE::Tools::run_with_timeout(15, sub {
+		    print {$writer} "$cmd\n";
+		    $writer->flush();
+		});
+	    };
+	    $err = "failed to write tunnel version command - $@\n" if $@;
+	    my $ver = $self->read_tunnel($tunnel, 10);
+	    $ver = JSON::decode_json($ver);
+	    $ver = $ver->{tunnel};
+
+	    if ($ver =~ /^(\d+)$/) {
+		$tunnel->{version} = $1;
+		$self->log('info', "tunnel info: $ver\n");
+	    } else {
+		$err = "received invalid tunnel version string '$ver'\n" if !$err;
+	    }
+	};
+	$err = $@ if !$err;
+
+	if ($err) {
+	    $self->finish_command_pipe($tunnel);
+	    die "can't open migration tunnel - $err";
+	}
+
+	$params->{url} = "$websocket_url?";
+	$tunnel->{params} = $params; # for forwarding
+
+	return $tunnel;
+    } else {
+	eval {
+	    $writer->reader();
+	    $reader->writer();
+	    PVE::Tools::run_command(
+		['proxmox-websocket-tunnel'],
+		input => "<&".fileno($writer),
+		output => ">&".fileno($reader),
+		errfunc => sub { my $line = shift; print "tunnel: $line\n"; },
+	    );
+	};
+	warn "CMD websocket tunnel died: $@\n" if $@;
+	exit 0;
+    }
+}
+
 sub finish_tunnel {
-    my ($self, $tunnel) = @_;
+    my ($self, $tunnel, $cleanup) = @_;
 
-    eval { $self->write_tunnel($tunnel, 30, 'quit'); };
+    $cleanup = $cleanup ? 1 : 0;
+
+    eval { $self->write_tunnel($tunnel, 30, 'quit', { cleanup => $cleanup }); };
     my $err = $@;
 
     $self->finish_command_pipe($tunnel, 30);
@@ -337,14 +508,19 @@ sub prepare {
 
     my $vollist = PVE::QemuServer::get_vm_volumes($conf);
 
+    my $storages = {};
     foreach my $volid (@$vollist) {
 	my ($sid, $volname) = PVE::Storage::parse_volume_id($volid, 1);
 
-	# check if storage is available on both nodes
 	my $targetsid = PVE::QemuServer::map_id($self->{opts}->{storagemap}, $sid);
+	$storages->{$targetsid} = 1;
 
+	# check if storage is available on source node
 	my $scfg = PVE::Storage::storage_check_node($self->{storecfg}, $sid);
-	PVE::Storage::storage_check_node($self->{storecfg}, $targetsid, $self->{node});
+
+	# check if storage is available on target node
+	PVE::Storage::storage_check_node($self->{storecfg}, $targetsid, $self->{node})
+	    if !$self->{opts}->{remote};
 
 	if ($scfg->{shared}) {
 	    # PVE::Storage::activate_storage checks this for non-shared storages
@@ -354,10 +530,16 @@ sub prepare {
 	}
     }
 
-    # test ssh connection
-    my $cmd = [ @{$self->{rem_ssh}}, '/bin/true' ];
-    eval { $self->cmd_quiet($cmd); };
-    die "Can't connect to destination address using public key\n" if $@;
+    if ($self->{opts}->{remote}) {
+	# test & establish websocket connection
+	$self->{tunnel} = $self->fork_websocket_tunnel($storages);
+	print "websocket tunnel started\n";
+    } else {
+	# test ssh connection
+	my $cmd = [ @{$self->{rem_ssh}}, '/bin/true' ];
+	eval { $self->cmd_quiet($cmd); };
+	die "Can't connect to destination address using public key\n" if $@;
+    }
 
     return $running;
 }
@@ -395,7 +577,7 @@ sub sync_disks {
 	my @sids = PVE::Storage::storage_ids($storecfg);
 	foreach my $storeid (@sids) {
 	    my $scfg = PVE::Storage::storage_config($storecfg, $storeid);
-	    next if $scfg->{shared};
+	    next if $scfg->{shared} && !$self->{opts}->{remote};
 	    next if !PVE::Storage::storage_check_enabled($storecfg, $storeid, undef, 1);
 
 	    # get list from PVE::Storage (for unused volumes)
@@ -403,15 +585,17 @@ sub sync_disks {
 
 	    next if @{$dl->{$storeid}} == 0;
 
-	    my $targetsid = PVE::QemuServer::map_id($self->{opts}->{storagemap}, $storeid);
-	    # check if storage is available on target node
-	    PVE::Storage::storage_check_node($storecfg, $targetsid, $self->{node});
+	    if (!$self->{opts}->{remote}) {
+		my $targetsid = PVE::QemuServer::map_id($self->{opts}->{storagemap}, $storeid);
+		# check if storage is available on target node
+		PVE::Storage::storage_check_node($storecfg, $targetsid, $self->{node});
 
-	    # grandfather in existing mismatches
-	    if ($targetsid ne $storeid) {
-		my $target_scfg = PVE::Storage::storage_config($storecfg, $targetsid);
-		die "content type 'images' is not available on storage '$targetsid'\n"
-		    if !$target_scfg->{content}->{images};
+		# grandfather in existing mismatches
+		if ($targetsid ne $storeid) {
+		    my $target_scfg = PVE::Storage::storage_config($storecfg, $targetsid);
+		    die "content type 'images' is not available on storage '$targetsid'\n"
+			if !$target_scfg->{content}->{images};
+		}
 	    }
 
 	    PVE::Storage::foreach_volid($dl, sub {
@@ -456,12 +640,16 @@ sub sync_disks {
 
 	    my ($sid, $volname) = PVE::Storage::parse_volume_id($volid);
 
-	    my $targetsid = PVE::QemuServer::map_id($self->{opts}->{storagemap}, $sid);
-	    # check if storage is available on both nodes
+	    # check if storage is available on source node
 	    my $scfg = PVE::Storage::storage_check_node($storecfg, $sid);
-	    PVE::Storage::storage_check_node($storecfg, $targetsid, $self->{node});
 
-	    return if $scfg->{shared};
+	    # check target storage on target node if intra-cluster migration
+	    if (!$self->{opts}->{remote}) {
+		my $targetsid = PVE::QemuServer::map_id($self->{opts}->{storagemap}, $sid);
+		PVE::Storage::storage_check_node($storecfg, $targetsid, $self->{node});
+
+		return if $scfg->{shared};
+	    }
 
 	    $local_volumes->{$volid}->{ref} = $attr->{referenced_in_config} ? 'config' : 'snapshot';
 	    $local_volumes->{$volid}->{ref} = 'storage' if $attr->{is_unused};
@@ -543,6 +731,9 @@ sub sync_disks {
 
 	    my $migratable = $scfg->{type} =~ /^(?:dir|zfspool|lvmthin|lvm)$/;
 
+	    # TODO: what is this even here for?
+	    $migratable = 1 if $self->{opts}->{remote};
+
 	    die "can't migrate '$volid' - storage type '$scfg->{type}' not supported\n"
 		if !$migratable;
 
@@ -553,6 +744,9 @@ sub sync_disks {
 	}
 
 	if ($self->{replication_jobcfg}) {
+	    die "can't migrate VM with replicated volumes to remote cluster/node\n"
+		if $self->{opts}->{remote};
+
 	    if ($self->{running}) {
 
 		my $version = PVE::QemuServer::kvm_user_version();
@@ -615,6 +809,7 @@ sub sync_disks {
 
 	$self->log('info', "copying local disk images") if scalar(%$local_volumes);
 
+	my $forwarded = 0;
 	foreach my $volid (sort keys %$local_volumes) {
 	    my ($sid, $volname) = PVE::Storage::parse_volume_id($volid);
 	    my $targetsid = PVE::QemuServer::map_id($self->{opts}->{storagemap}, $sid);
@@ -628,27 +823,121 @@ sub sync_disks {
 		next;
 	    } else {
 		next if $self->{replicated_volumes}->{$volid};
+
 		push @{$self->{volumes}}, $volid;
+		my $new_volid;
+		
 		my $opts = $self->{opts};
-		# use 'migrate' limit for transfer to other node
-		my $bwlimit = PVE::Storage::get_bandwidth_limit('migration', [$targetsid, $sid], $opts->{bwlimit});
-		# JSONSchema and get_bandwidth_limit use kbps - storage_migrate bps
-		$bwlimit = $bwlimit * 1024 if defined($bwlimit);
-
-		my $storage_migrate_opts = {
-		    'ratelimit_bps' => $bwlimit,
-		    'insecure' => $opts->{migration_type} eq 'insecure',
-		    'with_snapshots' => $local_volumes->{$volid}->{snapshots},
-		    'allow_rename' => !$local_volumes->{$volid}->{is_vmstate},
-		};
-
-		my $logfunc = sub { $self->log('info', $_[0]); };
-		my $new_volid = eval {
-		    PVE::Storage::storage_migrate($storecfg, $volid, $self->{ssh_info},
-						  $targetsid, $storage_migrate_opts, $logfunc);
-		};
-		if (my $err = $@) {
-		    die "storage migration for '$volid' to storage '$targetsid' failed - $err\n";
+		if (my $remote = $opts->{remote}) {
+		    my $remote_vmid = $remote->{vmid};
+		    my (undef, $name, undef, undef, undef, undef, $format) = PVE::Storage::parse_volname($storecfg, $volid);
+		    my $scfg = PVE::Storage::storage_config($storecfg, $sid);
+		    PVE::Storage::activate_volumes($storecfg, [$volid]);
+
+		    # use 'migrate' limit for transfer to other node
+		    # TODO: get limit from remote cluster as well?
+		    my $bwlimit = PVE::Storage::get_bandwidth_limit('migration', [$sid], $opts->{bwlimit});
+		    # JSONSchema and get_bandwidth_limit use kbps - storage_migrate bps
+		    $bwlimit = $bwlimit * 1024 if defined($bwlimit);
+
+		    my $with_snapshots = $local_volumes->{$volid}->{snapshots} ? 1 : 0;
+		    my $snapshot;
+		    if ($scfg->{type} eq 'zfspool') {
+			$snapshot = '__migration__';
+			$with_snapshots = 1;
+			PVE::Storage::volume_snapshot($storecfg, $volid, $snapshot);
+		    }
+
+		    if ($self->{vmid} != $remote_vmid) {
+			$name =~ s/-$self->{vmid}-/-$remote_vmid-/g;
+			$name =~ s/^$self->{vmid}\//$remote_vmid\//;
+		    }
+
+		    my @export_formats = PVE::Storage::volume_export_formats($storecfg, $volid, undef, undef, $with_snapshots);
+
+		    my $storage_migrate_opts = {
+			format => $format,
+			storage => $targetsid,
+			'with-snapshots' => $with_snapshots,
+			'allow-rename' => !$local_volumes->{$volid}->{is_vmstate},
+			'export-formats' => @export_formats,
+			volname => $name,
+		    };
+		    my $res = $self->write_tunnel($self->{tunnel}, 600, 'disk-import', $storage_migrate_opts);
+		    my $local = "/run/qemu-server/$self->{vmid}.storage";
+		    if (!$forwarded) {
+			$forward_unix_socket->($self, $local, $res->{socket});
+			$forwarded = 1;
+		    }
+		    my $socket = IO::Socket::UNIX->new(Peer => $local, Type => SOCK_STREAM())
+			or die "failed to connect to websocket tunnel at $local\n";
+		    # we won't be reading from the socket
+		    shutdown($socket, 0);
+		    my $send = ['pvesm', 'export', $volid, $res->{format}, '-', '-with-snapshots', $with_snapshots];
+		    push @$send, '-snapshot', $snapshot if $snapshot;
+
+		    my @cstream;
+		    if (defined($bwlimit)) {
+			@cstream = ([ '/usr/bin/cstream', '-t', $bwlimit ]);
+			$self->log('info', "using a bandwidth limit of $bwlimit bps for transferring '$volid'");
+		    }
+
+		    eval {
+			PVE::Tools::run_command(
+			    [$send, @cstream],
+			    output => '>&'.fileno($socket),
+			    errfunc => sub { my $line = shift; $self->log('warn', $line); },
+			);
+		    };
+		    my $send_error = $@;
+
+		    # don't close the connection entirely otherwise the
+		    # receiving end might not get all buffered data (and
+		    # fails with 'connection reset by peer')
+		    shutdown($socket, 1);
+
+		    # wait for the remote process to finish
+		    while ($res = $self->write_tunnel($self->{tunnel}, 10, 'query-disk-import')) {
+			if ($res->{status} eq 'pending') {
+			    $self->log('info', "waiting for disk import to finish..\n");
+			    sleep(1)
+			} elsif ($res->{status} eq 'complete') {
+			    $new_volid = $res->{volid};
+			    last;
+			} else {
+			    die "unknown query-disk-import result: $res->{status}\n";
+			}
+		    }
+
+		    # now close the socket
+		    close($socket);
+		    die $send_error if $send_error;
+		} else {
+		    # use 'migrate' limit for transfer to other node
+		    my $bwlimit = PVE::Storage::get_bandwidth_limit('migration', [$sid], $opts->{bwlimit});
+		    # JSONSchema and get_bandwidth_limit use kbps - storage_migrate bps
+		    $bwlimit = $bwlimit * 1024 if defined($bwlimit);
+		    my $storage_migrate_opts = {
+			'ratelimit_bps' => $bwlimit,
+			'insecure' => $opts->{migration_type} eq 'insecure',
+			'with_snapshots' => $local_volumes->{$volid}->{snapshots},
+			'allow_rename' => !$local_volumes->{$volid}->{is_vmstate},
+		    };
+
+		    my $logfunc = sub { $self->log('info', $_[0]); };
+		    $new_volid = eval {
+			PVE::Storage::storage_migrate(
+			    $storecfg,
+			    $volid,
+			    $self->{ssh_info},
+			    $targetsid,
+			    $storage_migrate_opts,
+			    $logfunc,
+			);
+		    };
+		    if (my $err = $@) {
+			die "storage migration for '$volid' to storage '$targetsid' failed - $err\n";
+		    }
 		}
 
 		$self->{volume_map}->{$volid} = $new_volid;
@@ -667,6 +956,12 @@ sub sync_disks {
 sub cleanup_remotedisks {
     my ($self) = @_;
 
+    if ($self->{opts}->{remote}) {
+	$self->finish_tunnel($self, $self->{tunnel}, 1);
+	delete $self->{tunnel};
+	return;
+    }
+
     foreach my $target_drive (keys %{$self->{target_drive}}) {
 	my $drivestr = $self->{target_drive}->{$target_drive}->{drivestr};
 	next if !defined($drivestr);
@@ -714,8 +1009,71 @@ sub phase1 {
     # sync_disks fixes disk sizes to match their actual size, write changes so
     # target allocates correct volumes
     PVE::QemuConfig->write_config($vmid, $conf);
+
+    $self->phase1_remote($vmid) if $self->{opts}->{remote};
 };
 
+sub phase1_remote {
+    my ($self, $vmid) = @_;
+
+    my $remote_conf = PVE::QemuConfig->load_config($vmid);
+    PVE::QemuConfig->update_volume_ids($remote_conf, $self->{volume_map});
+
+    my $storage_map = $self->{opts}->{storagemap};
+    $self->{nbd} = {};
+    PVE::QemuConfig->foreach_volume($remote_conf, sub {
+	my ($ds, $drive) = @_;
+
+	# TODO eject CDROM?
+	return if PVE::QemuServer::drive_is_cdrom($drive);
+
+	my $volid = $drive->{file};
+	return if !$volid;
+
+	return if !grep { $_ eq $volid} @{$self->{online_local_volumes}};
+
+	my ($storeid, $volname) = PVE::Storage::parse_volume_id($volid);
+	my $scfg = PVE::Storage::storage_config($self->{storecfg}, $storeid);
+	my $source_format = PVE::QemuServer::qemu_img_format($scfg, $volname);
+
+	# set by target cluster
+	my $oldvolid = delete $drive->{file};
+	delete $drive->{format};
+
+	my $targetsid = PVE::QemuServer::map_id($storage_map, $storeid);
+
+	my $params = {
+	    format => $source_format,
+	    storage => $targetsid,
+	    drive => $drive,
+	};
+
+	$self->log('info', "Allocating volume for drive '$ds' on remote storage '$targetsid'..");
+	my $res = $self->write_tunnel($self->{tunnel}, 600, 'disk', $params);
+
+	$self->log('info', "mapped: $oldvolid => $res->{volid}");
+	$remote_conf->{$ds} = $res->{drivestr};
+	$self->{nbd}->{$ds} = $res;
+    });
+
+    # TODO: check bridge availability earlier?
+    my $bridgemap = $self->{opts}->{bridgemap};
+    foreach my $opt (keys %$remote_conf) {
+	next if $opt !~ m/^net\d+$/;
+
+	next if !$remote_conf->{$opt};
+	my $d = PVE::QemuServer::parse_net($remote_conf->{$opt});
+	next if !$d || !$d->{bridge};
+
+	my $target_bridge = PVE::QemuServer::map_id($bridgemap, $d->{bridge});
+	$self->log('info', "mapped: $opt from $d->{bridge} to $target_bridge");
+	$d->{bridge} = $target_bridge;
+	$remote_conf->{$opt} = PVE::QemuServer::print_net($d);
+    }
+
+    $self->write_tunnel($self->{tunnel}, 10, 'config', { conf => $remote_conf });
+}
+
 sub phase1_cleanup {
     my ($self, $vmid, $err) = @_;
 
@@ -863,6 +1221,28 @@ sub phase2_start_local_cluster {
     return ($tunnel_info, $spice_port);
 }
 
+sub phase2_start_remote_cluster {
+    my ($self, $vmid, $params) = @_;
+
+    die "insecure migration to remote cluster not implemented\n"
+	if $params->{migrate_opts}->{type} ne 'websocket';
+
+    my $remote_vmid = $self->{opts}->{remote}->{vmid};
+
+    my $res = $self->write_tunnel($self->{tunnel}, 10, "start", $params);
+
+    foreach my $drive (keys %{$res->{drives}}) {
+	$self->{target_drive}->{$drive}->{drivestr} = $res->{drives}->{$drive}->{drivestr};
+	my $nbd_uri = $res->{drives}->{$drive}->{nbd_uri};
+	die "unexpected NBD uri for '$drive': $nbd_uri\n"
+	    if $nbd_uri !~ s!/run/qemu-server/$remote_vmid\_!/run/qemu-server/$vmid\_!;
+
+	$self->{target_drive}->{$drive}->{nbd_uri} = $nbd_uri;
+    }
+
+    return ($res->{migrate}, $res->{spice_port});
+}
+
 sub phase2 {
     my ($self, $vmid) = @_;
 
@@ -898,10 +1278,39 @@ sub phase2 {
 	},
     };
 
-    my ($tunnel_info, $spice_port) = $self->phase2_start_local_cluster($vmid, $params);
+    my ($tunnel_info, $spice_port);
+
+    if (my $remote = $self->{opts}->{remote}) {
+	my $remote_vmid = $remote->{vmid};
+	$params->{migrate_opts}->{remote_node} = $self->{node};
+	($tunnel_info, $spice_port) = $self->phase2_start_remote_cluster($vmid, $params);
+	die "only UNIX sockets are supported for remote migration\n"
+	    if $tunnel_info->{proto} ne 'unix';
+
+	my $forwarded = {};
+	my $remote_socket = $tunnel_info->{addr};
+	my $local_socket = $remote_socket;
+	$local_socket =~ s/$remote_vmid/$vmid/g;
+	$tunnel_info->{addr} = $local_socket;
+
+	$self->log('info', "forwarding migration socket '$local_socket' => '$remote_socket'");
+	$forward_unix_socket->($self, $local_socket, $remote_socket);
+	$forwarded->{$local_socket} = 1;
+
+	foreach my $remote_socket (@{$tunnel_info->{unix_sockets}}) {
+	    my $local_socket = $remote_socket;
+	    $local_socket =~ s/$remote_vmid/$vmid/g;
+	    next if $forwarded->{$local_socket};
+	    $self->log('info', "forwarding migration socket '$local_socket' => '$remote_socket'");
+	    $forward_unix_socket->($self, $local_socket, $remote_socket);
+	    $forwarded->{$local_socket} = 1;
+	}
+    } else {
+	($tunnel_info, $spice_port) = $self->phase2_start_local_cluster($vmid, $params);
 
-    $self->log('info', "start remote tunnel");
-    $self->start_remote_tunnel($tunnel_info);
+	$self->log('info', "start remote tunnel");
+	$self->start_remote_tunnel($tunnel_info);
+    }
 
     my $migrate_uri = "$tunnel_info->{proto}:$tunnel_info->{addr}";
     $migrate_uri .= ":$tunnel_info->{port}"
@@ -923,13 +1332,15 @@ sub phase2 {
 	    my $nbd_uri = $target->{nbd_uri};
 
 	    my $source_drive = PVE::QemuServer::parse_drive($drive, $conf->{$drive});
-	    my $target_drive = PVE::QemuServer::parse_drive($drive, $target->{drivestr});
-
 	    my $source_volid = $source_drive->{file};
-	    my $target_volid = $target_drive->{file};
-
 	    my $source_sid = PVE::Storage::Plugin::parse_volume_id($source_volid);
-	    my $target_sid = PVE::Storage::Plugin::parse_volume_id($target_volid);
+
+	    my ($target_volid, $target_sid);
+	    if (!$self->{opts}->{remote}) {
+		my $target_drive = PVE::QemuServer::parse_drive($drive, $target->{drivestr});
+		$target_volid = $target_drive->{file};
+		$target_sid = PVE::Storage::Plugin::parse_volume_id($target_volid);
+	    }
 
 	    my $bwlimit = PVE::Storage::get_bandwidth_limit('migration', [$source_sid, $target_sid], $opt_bwlimit);
 	    my $bitmap = $target->{bitmap};
@@ -937,8 +1348,10 @@ sub phase2 {
 	    $self->log('info', "$drive: start migration to $nbd_uri");
 	    PVE::QemuServer::qemu_drive_mirror($vmid, $drive, $nbd_uri, $vmid, undef, $self->{storage_migration_jobs}, 'skip', undef, $bwlimit, $bitmap);
 
-	    $self->{volume_map}->{$source_volid} = $target_volid;
-	    $self->log('info', "volume '$source_volid' is '$target_volid' on the target\n");
+	    if ($target_volid) {
+		$self->{volume_map}->{$source_volid} = $target_volid;
+		$self->log('info', "volume '$source_volid' is '$target_volid' on the target\n");
+	    }
 	}
     }
 
@@ -995,7 +1408,7 @@ sub phase2 {
     };
     $self->log('info', "migrate-set-parameters error: $@") if $@;
 
-    if (PVE::QemuServer::vga_conf_has_spice($conf->{vga})) {
+    if (PVE::QemuServer::vga_conf_has_spice($conf->{vga} && !$self->{opts}->{remote})) {
 	my $rpcenv = PVE::RPCEnvironment::get();
 	my $authuser = $rpcenv->get_user();
 
@@ -1159,11 +1572,15 @@ sub phase2_cleanup {
 
     my $nodename = PVE::INotify::nodename();
 
-    my $cmd = [@{$self->{rem_ssh}}, 'qm', 'stop', $vmid, '--skiplock', '--migratedfrom', $nodename];
-    eval{ PVE::Tools::run_command($cmd, outfunc => sub {}, errfunc => sub {}) };
-    if (my $err = $@) {
-        $self->log('err', $err);
-        $self->{errors} = 1;
+    if ($self->{tunnel} && $self->{tunnel}->{version} >= 2) {
+	$self->write_tunnel($self->{tunnel}, 10, 'stop');
+    } else {
+	my $cmd = [@{$self->{rem_ssh}}, 'qm', 'stop', $vmid, '--skiplock', '--migratedfrom', $nodename];
+	eval{ PVE::Tools::run_command($cmd, outfunc => sub {}, errfunc => sub {}) };
+	if (my $err = $@) {
+	    $self->log('err', $err);
+	    $self->{errors} = 1;
+	}
     }
 
     # cleanup after stopping, otherwise disks might be in-use by target VM!
@@ -1188,6 +1605,8 @@ sub phase3 {
     my $volids = $self->{volumes};
     return if $self->{phase2errors};
 
+    return if $self->{opts}->{remote};
+
     # destroy local copies
     foreach my $volid (@$volids) {
 	eval { PVE::Storage::vdisk_free($self->{storecfg}, $volid); };
@@ -1220,7 +1639,7 @@ sub phase3_cleanup {
 	}
     }
 
-    if ($self->{volume_map}) {
+    if ($self->{volume_map} && !$self->{opts}->{remote}) {
 	my $target_drives = $self->{target_drive};
 
 	# FIXME: for NBD storage migration we now only update the volid, and
@@ -1237,26 +1656,35 @@ sub phase3_cleanup {
 
     # transfer replication state before move config
     $self->transfer_replication_state() if $self->{is_replicated};
-    PVE::QemuConfig->move_config_to_node($vmid, $self->{node});
+    if ($self->{opts}->{remote}) {
+	# TODO decide whether to remove or lock&keep here
+    } else {
+	PVE::QemuConfig->move_config_to_node($vmid, $self->{node});
+    }
     $self->switch_replication_job_target() if $self->{is_replicated};
 
     if ($self->{livemigration}) {
 	if ($self->{stopnbd}) {
 	    $self->log('info', "stopping NBD storage migration server on target.");
 	    # stop nbd server on remote vm - requirement for resume since 2.9
-	    my $cmd = [@{$self->{rem_ssh}}, 'qm', 'nbdstop', $vmid];
+	    if ($tunnel && $tunnel->{version} && $tunnel->{version} >= 2) {
+		$self->write_tunnel($tunnel, 30, 'nbdstop');
+	    } else {
+		my $cmd = [@{$self->{rem_ssh}}, 'qm', 'nbdstop', $vmid];
 
-	    eval{ PVE::Tools::run_command($cmd, outfunc => sub {}, errfunc => sub {}) };
-	    if (my $err = $@) {
-		$self->log('err', $err);
-		$self->{errors} = 1;
+		eval{ PVE::Tools::run_command($cmd, outfunc => sub {}, errfunc => sub {}) };
+		if (my $err = $@) {
+		    $self->log('err', $err);
+		    $self->{errors} = 1;
+		}
 	    }
 	}
 
 	# config moved and nbd server stopped - now we can resume vm on target
 	if ($tunnel && $tunnel->{version} && $tunnel->{version} >= 1) {
+	    my $cmd = $tunnel->{version} == 1 ? "resume $vmid" : "resume";
 	    eval {
-		$self->write_tunnel($tunnel, 30, "resume $vmid");
+		$self->write_tunnel($tunnel, 30, $cmd);
 	    };
 	    if (my $err = $@) {
 		$self->log('err', $err);
@@ -1276,18 +1704,21 @@ sub phase3_cleanup {
 	}
 
 	if ($self->{storage_migration} && PVE::QemuServer::parse_guest_agent($conf)->{fstrim_cloned_disks} && $self->{running}) {
+	    # TODO mtunnel command
 	    my $cmd = [@{$self->{rem_ssh}}, 'qm', 'guest', 'cmd', $vmid, 'fstrim'];
 	    eval{ PVE::Tools::run_command($cmd, outfunc => sub {}, errfunc => sub {}) };
 	}
     }
 
     # close tunnel on successful migration, on error phase2_cleanup closed it
-    if ($tunnel) {
+    if ($tunnel && $tunnel->{version} == 1) {
 	eval { finish_tunnel($self, $tunnel);  };
 	if (my $err = $@) {
 	    $self->log('err', $err);
 	    $self->{errors} = 1;
 	}
+	$tunnel = undef;
+	delete $self->{tunnel};
     }
 
     eval {
@@ -1321,7 +1752,8 @@ sub phase3_cleanup {
 	$self->{errors} = 1;
     }
 
-    if($self->{storage_migration}) {
+    # TODO decide whether we want a "keep source VM" option for remote migration
+    if ($self->{storage_migration} && !$self->{opts}->{remote}) {
 	# destroy local copies
 	my $volids = $self->{online_local_volumes};
 
@@ -1340,8 +1772,14 @@ sub phase3_cleanup {
     }
 
     # clear migrate lock
-    my $cmd = [ @{$self->{rem_ssh}}, 'qm', 'unlock', $vmid ];
-    $self->cmd_logerr($cmd, errmsg => "failed to clear migrate lock");
+    if ($tunnel && $tunnel->{version} >= 2) {
+	$self->write_tunnel($tunnel, 10, "unlock");
+
+	$self->finish_tunnel($tunnel);
+    } else {
+	my $cmd = [ @{$self->{rem_ssh}}, 'qm', 'unlock', $vmid ];
+	$self->cmd_logerr($cmd, errmsg => "failed to clear migrate lock");
+    }
 }
 
 sub final_cleanup {
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index a131fc8..a713258 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -5094,7 +5094,11 @@ sub vm_start_nolock {
     my $defaults = load_defaults();
 
     # set environment variable useful inside network script
-    $ENV{PVE_MIGRATED_FROM} = $migratedfrom if $migratedfrom;
+    if ($migrate_opts->{remote_node}) {
+	$ENV{PVE_MIGRATED_FROM} = $migrate_opts->{remote_node};
+    } elsif ($migratedfrom) {
+	$ENV{PVE_MIGRATED_FROM} = $migratedfrom;
+    }
 
     PVE::GuestHelpers::exec_hookscript($conf, $vmid, 'pre-start', 1);
 
@@ -5310,7 +5314,7 @@ sub vm_start_nolock {
 
 	my $migrate_storage_uri;
 	# nbd_protocol_version > 0 for unix socket support
-	if ($nbd_protocol_version > 0 && $migration_type eq 'secure') {
+	if ($nbd_protocol_version > 0 && ($migration_type eq 'secure' || $migration_type eq 'websocket')) {
 	    my $socket_path = "/run/qemu-server/$vmid\_nbd.migrate";
 	    mon_cmd($vmid, "nbd-server-start", addr => { type => 'unix', data => { path => $socket_path } } );
 	    $migrate_storage_uri = "nbd:unix:$socket_path";
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH qemu-server 7/7] api: add remote migrate endpoint
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (20 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 6/7] migrate: add remote migration handling Fabian Grünbichler
@ 2021-04-13 12:16 ` Fabian Grünbichler
  2021-04-15 14:04 ` [pve-devel] [RFC qemu-server++ 0/22] remote migration alexandre derumier
  22 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-13 12:16 UTC (permalink / raw)
  To: pve-devel

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
---

Notes:
    requires
    - pve-manager with 'addr' API endpoint on target node
    - pve-cluster with RemoteConfig support on local node
    - pve-common with bridgepair format
    - pve-guest-common with AbstractMigrate handling remote migration

 PVE/API2/Qemu.pm | 196 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 194 insertions(+), 2 deletions(-)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index bf5ca14..28dd323 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -39,6 +39,7 @@ use PVE::API2::Firewall::VM;
 use PVE::API2::Qemu::Agent;
 use PVE::VZDump::Plugin;
 use PVE::DataCenterConfig;
+use PVE::RemoteConfig;
 use PVE::SSHInfo;
 
 BEGIN {
@@ -50,8 +51,6 @@ BEGIN {
     }
 }
 
-use Data::Dumper; # fixme: remove
-
 use base qw(PVE::RESTHandler);
 
 my $opt_force_description = "Force physical removal. Without this, we simple remove the disk from the config file and create an additional configuration entry called 'unused[n]', which contains the volume ID. Unlink of unused[n] always cause physical removal.";
@@ -3754,6 +3753,199 @@ __PACKAGE__->register_method({
 
     }});
 
+__PACKAGE__->register_method({
+    name => 'remote_migrate_vm',
+    path => '{vmid}/remote_migrate',
+    method => 'POST',
+    protected => 1,
+    proxyto => 'node',
+    description => "Migrate virtual machine to a remote cluster. Creates a new migration task.",
+    permissions => {
+	check => ['perm', '/vms/{vmid}', [ 'VM.Migrate' ]],
+    },
+    parameters => {
+	additionalProperties => 0,
+	properties => {
+	    node => get_standard_option('pve-node'),
+	    vmid => get_standard_option('pve-vmid', { completion => \&PVE::QemuServer::complete_vmid }),
+	    'target-vmid' => get_standard_option('pve-vmid', { optional => 1 }),
+	    'target-node' => get_standard_option('pve-node', {
+		description => "Target node on remote cluster.",
+		completion =>  \&PVE::RemoteConfig::complete_remote_node,
+            }),
+	    'target-cluster' => get_standard_option('pve-remote-cluster', {
+		description => "Remote target cluster",
+		completion => \&PVE::RemoteConfig::complete_remote_cluster,
+	    }),
+	    online => {
+		type => 'boolean',
+		description => "Use online/live migration if VM is running. Ignored if VM is stopped.",
+		optional => 1,
+	    },
+	    'migration-network' => {
+		type => 'string', format => 'CIDR',
+		description => "CIDR of the (sub) network that is used for migration.",
+		optional => 1,
+	    },
+	    'with-local-disks' => {
+		type => 'boolean',
+		description => "Enable live storage migration for local disk",
+		optional => 1,
+	    },
+            targetstorage => get_standard_option('pve-targetstorage', {
+		completion => \&PVE::QemuServer::complete_migration_storage,
+		optional => 0,
+            }),
+	    targetbridge => {
+		type => 'string',
+		description => "Mapping from source to target bridges. Providing only a single bridge ID maps all source bridges to that bridge. Providing the special value '1' will map each source bridge to itself.",
+		format => 'bridgepair-list',
+	    },
+	    bwlimit => {
+		description => "Override I/O bandwidth limit (in KiB/s).",
+		optional => 1,
+		type => 'integer',
+		minimum => '0',
+		default => 'migrate limit from datacenter or storage config',
+	    },
+	},
+    },
+    returns => {
+	type => 'string',
+	description => "the task ID.",
+    },
+    code => sub {
+	my ($param) = @_;
+
+	my $rpcenv = PVE::RPCEnvironment::get();
+	my $authuser = $rpcenv->get_user();
+
+	my $source_vmid = extract_param($param, 'vmid');
+	my $target_cluster = extract_param($param, 'target-cluster');
+	my $target_node = extract_param($param, 'target-node');
+	my $target_vmid = extract_param($param, 'target-vmid') // $source_vmid;
+
+	my $localnode = PVE::INotify::nodename();
+	my $network = extract_param($param, 'migration-network');
+
+	PVE::Cluster::check_cfs_quorum();
+
+	raise_param_exc({ 'migration-network' => "Only root may use this option." })
+	    if $network && $authuser ne 'root@pam';
+
+	# test if VM exists
+	my $conf = PVE::QemuConfig->load_config($source_vmid);
+
+	PVE::QemuConfig->check_lock($conf);
+
+	raise_param_exc({ vmid => "cannot migrate HA-manage VM to remote cluster" })
+	    if PVE::HA::Config::vm_is_ha_managed($source_vmid);
+
+	my $remote_conf = PVE::RemoteConfig->new();
+
+	# TODO: check remote ACLs
+	my ($ip_info, $fp, $conn) = $remote_conf->get_remote_info($target_cluster, $target_node, $network);
+
+	die "Unable to determine remote IP\n"
+	    if !defined($ip_info) || !defined($ip_info->{default});
+
+	my $extra_ips = $ip_info->{extra} // [];
+	die "Unable to determine remote IP in migration network '$network'\n"
+	    if defined($network) && !@$extra_ips;
+
+	my $target_ip;
+	if (@$extra_ips) {
+	    $target_ip = $ip_info->{extra}[0];
+	    print "remote: selected IP '$target_ip' within '$network'.\n";
+	} else {
+	    $target_ip = $ip_info->{default};
+	    print "remote: selected default IP '$target_ip'.\n";
+	}
+
+	$conn->{host} = $target_ip;
+	$conn->{cached_fingerprints}->{$fp} = 1 if defined($fp);
+
+	my $api_client = PVE::APIClient::LWP->new(%$conn);
+	my $version = $api_client->get("/version");
+	print "remote: version '$version->{version}\n";
+
+	if (PVE::QemuServer::check_running($source_vmid)) {
+	    die "can't migrate running VM without --online\n" if !$param->{online};
+
+	    my $repl_conf = PVE::ReplicationConfig->new();
+	    my $is_replicated = $repl_conf->check_for_existing_jobs($source_vmid, 1);
+	    die "cannot remote-migrate replicated VM\n" if $is_replicated;
+	} else {
+	    warn "VM isn't running. Doing offline migration instead.\n" if $param->{online};
+	    $param->{online} = 0;
+	}
+
+	# FIXME: fork worker hear to avoid timeout? or poll these periodically
+	# in pvestatd and access cached info here? all of the below is actually
+	# checked at the remote end anyway once we call the mtunnel endpoint,
+	# we could also punt it to the client and not do it here at all..
+	my $resources = $api_client->get("/cluster/resources");
+	if (grep { defined($_->{vmid}) && $_->{vmid} eq $target_vmid } @$resources) {
+	    raise_param_exc({ target_vmid => "Guest with ID '$target_vmid' already exists on remote cluster" });
+	}
+
+	my $storages = [ grep { $_->{type} eq 'storage' && $_->{node} eq $target_node } @$resources ];
+	my $storecfg = PVE::Storage::config();
+	my $targetstorage = extract_param($param, 'targetstorage');
+	my $storagemap = eval { PVE::JSONSchema::parse_idmap($targetstorage, 'pve-storage-id') };
+	raise_param_exc({ targetstorage => "failed to parse storage map: $@" })
+	    if $@;
+
+	my $targetbridge = extract_param($param, 'targetbridge');
+	my $bridgemap = eval { PVE::JSONSchema::parse_idmap($targetbridge, 'pve-bridge-id') };
+	raise_param_exc({ targetbridge => "failed to parse bridge map: $@" })
+	    if $@;
+
+	my $check_remote_storage = sub {
+	    my ($storage) = @_;
+	    my $found = [ grep { $_->{storage} eq $storage } @$storages ];
+	    die "remote: storage '$storage' does not exist!\n"
+		if !@$found;
+
+	    $found = @$found[0];
+
+	    my $content_types = [ PVE::Tools::split_list($found->{content}) ];
+	    die "remote: storage '$storage' cannot store images\n"
+		if !grep { $_ eq 'images' } @$content_types;
+	};
+
+	foreach my $target_sid (values %{$storagemap->{entries}}) {
+	    $check_remote_storage->($target_sid);
+	}
+
+	$check_remote_storage->($storagemap->{default})
+	    if $storagemap->{default};
+
+	# TODO: or check all referenced storages?
+	die "remote migration requires explicit storage mapping!\n"
+	    if $storagemap->{identity};
+
+	$param->{storagemap} = $storagemap;
+	$param->{bridgemap} = $bridgemap;
+	$param->{remote} = {
+	    conn => $conn,
+	    client => $api_client,
+	    vmid => $target_vmid,
+	};
+	$param->{migration_type} = 'websocket';
+	$param->{migration_network} = $network if $network;
+
+	my $realcmd = sub {
+	    PVE::QemuMigrate->migrate($target_node, $target_ip, $source_vmid, $param);
+	};
+
+	my $worker = sub {
+	    return PVE::GuestHelpers::guest_migration_lock($source_vmid, 10, $realcmd);
+	};
+
+	return $rpcenv->fork_worker('qmigrate', $source_vmid, $authuser, $worker);
+    }});
+
 __PACKAGE__->register_method({
     name => 'monitor',
     path => '{vmid}/monitor',
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [pve-devel] [RFC qemu-server++ 0/22] remote migration
  2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
                   ` (21 preceding siblings ...)
  2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 7/7] api: add remote migrate endpoint Fabian Grünbichler
@ 2021-04-15 14:04 ` alexandre derumier
  2021-04-15 14:32   ` Fabian Grünbichler
  22 siblings, 1 reply; 40+ messages in thread
From: alexandre derumier @ 2021-04-15 14:04 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Grünbichler

Hi,

thanks for working on this !

I'll be able to test it soon as I'll need to migrate 200-300 vms between 
2 datacenter soon.

I think it could be great to add optionnal "tag" option to targetbridge, 
as it could be different on target cluster.

Also, we should transfert vm firewall config.


On 13/04/2021 14:16, Fabian Grünbichler wrote:
> this series adds remote migration for VMs. there's still plenty of
> TODOs/FIXMEs/stuff that requires discussion, hence the RFC. live
> migration with NBD and storage-migrated disks should work already.
>
> the performance bottle neck (~190MB/s on loopback) for the websocket
> connection seems to be in pveproxy at the moment - the rust code should
> manage about 700MB/s.
>
> overview over affected repos and changes, see individual patches for
> more details.
>
> proxmox:
>
> some compatible changes to make websocket code usable for client-side
> connections, required by proxmox-websocket-tunnel
>
> proxmox-websocket-tunnel:
>
> new tunnel helper tool for forwarding commands and data over websocket
> connections, required by qemu-server on source side
> TODO: better error handling
> TODO: fingerprint checking/valid certs/..
> TODO: WS key generation
> TODO: decide on mask?
> TODO: investigate performance bottlenecks once PVE api server gets
> faster
>
> pve-access-control:
>
> new ticket type, required by qemu-server on target side
>
> pve-cluster:
>
> new remote.cfg and related helpers, required by qemu-server on source
> side
> TODO: ACLs, CLI, API for managing config
> TODO: handling of discovered nodes with valid certificates
> TODO: add additional information like default bwlimits, storage/bridge
> mappings
>
> pve-common:
>
> bridgepair format akin to storage pair, pve-bridge-id option, required
> by qemu-server
> TODO: adapt pve-container
>
> pve-guest-common:
>
> handle remote migration (no SSH) in AbstractMigrate,
> required by qemu-server
>
> pve-manager:
>
> new 'addr' endpoint for retrieving remote node IPs, required on target
> node
>
> pve-storage:
>
> extend 'pvesm import' to allow import from UNIX socket, required on
> target node by qemu-server
>
> qemu-server:
>
> some refactoring, new mtunnel endpoints, new remote_migration endpoints
> TODO: check remote ACLs
> TODO: handle pending changes and snapshots
> TODO: CLI for remote migration
> potential TODO: expose remote info via additional endpoints (resources? vmids?
> permissions? ...)
>
> as usual, some of the patches are best viewed with '-w', especially in
> qemu-server..
>
>
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [pve-devel] [RFC qemu-server++ 0/22] remote migration
  2021-04-15 14:04 ` [pve-devel] [RFC qemu-server++ 0/22] remote migration alexandre derumier
@ 2021-04-15 14:32   ` Fabian Grünbichler
  2021-04-15 14:36     ` Thomas Lamprecht
                       ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-15 14:32 UTC (permalink / raw)
  To: alexandre derumier, Proxmox VE development discussion

On April 15, 2021 4:04 pm, alexandre derumier wrote:
> Hi,
> 
> thanks for working on this !
> 
> I'll be able to test it soon as I'll need to migrate 200-300 vms between 
> 2 datacenter soon.

looking forward to feedback :) you'll need to put the 
proxmox-websocket-tunnel binary into $PATH of pveproxy/qm, after 
building it with 'cargo build'.

if your inter-DC link is fast enough, you'll likely be hit by the 
pveproxy bottleneck. it would still be interesting to get some 
real-world numbers, I haven't tested with baremetal and fast storage 
yet.

please be aware that this is very much experimental code still!

> I think it could be great to add optionnal "tag" option to targetbridge, 
> as it could be different on target cluster.

hmm, we could have another (optional) map for VLAN tags? since tags and 
bridges are not one entity (you can have on interface on bridge A with 
tag X, and another interface on bridge A with tag Y, and those need to 
be mapped to bridge B with tag P and bridge B with tag Q, for example).

> Also, we should transfert vm firewall config.

yes, that's definitely true. another source of potential 
mismatches/things to check before migrating (security groups/aliases!)

> 
> On 13/04/2021 14:16, Fabian Grünbichler wrote:
>> this series adds remote migration for VMs. there's still plenty of
>> TODOs/FIXMEs/stuff that requires discussion, hence the RFC. live
>> migration with NBD and storage-migrated disks should work already.
>>
>> the performance bottle neck (~190MB/s on loopback) for the websocket
>> connection seems to be in pveproxy at the moment - the rust code should
>> manage about 700MB/s.
>>
>> overview over affected repos and changes, see individual patches for
>> more details.
>>
>> proxmox:
>>
>> some compatible changes to make websocket code usable for client-side
>> connections, required by proxmox-websocket-tunnel
>>
>> proxmox-websocket-tunnel:
>>
>> new tunnel helper tool for forwarding commands and data over websocket
>> connections, required by qemu-server on source side
>> TODO: better error handling
>> TODO: fingerprint checking/valid certs/..
>> TODO: WS key generation
>> TODO: decide on mask?
>> TODO: investigate performance bottlenecks once PVE api server gets
>> faster
>>
>> pve-access-control:
>>
>> new ticket type, required by qemu-server on target side
>>
>> pve-cluster:
>>
>> new remote.cfg and related helpers, required by qemu-server on source
>> side
>> TODO: ACLs, CLI, API for managing config
>> TODO: handling of discovered nodes with valid certificates
>> TODO: add additional information like default bwlimits, storage/bridge
>> mappings
>>
>> pve-common:
>>
>> bridgepair format akin to storage pair, pve-bridge-id option, required
>> by qemu-server
>> TODO: adapt pve-container
>>
>> pve-guest-common:
>>
>> handle remote migration (no SSH) in AbstractMigrate,
>> required by qemu-server
>>
>> pve-manager:
>>
>> new 'addr' endpoint for retrieving remote node IPs, required on target
>> node
>>
>> pve-storage:
>>
>> extend 'pvesm import' to allow import from UNIX socket, required on
>> target node by qemu-server
>>
>> qemu-server:
>>
>> some refactoring, new mtunnel endpoints, new remote_migration endpoints
>> TODO: check remote ACLs
>> TODO: handle pending changes and snapshots
>> TODO: CLI for remote migration
>> potential TODO: expose remote info via additional endpoints (resources? vmids?
>> permissions? ...)
>>
>> as usual, some of the patches are best viewed with '-w', especially in
>> qemu-server..
>>
>>
>> _______________________________________________
>> pve-devel mailing list
>> pve-devel@lists.proxmox.com
>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>
> 
> 




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [pve-devel] [RFC qemu-server++ 0/22] remote migration
  2021-04-15 14:32   ` Fabian Grünbichler
@ 2021-04-15 14:36     ` Thomas Lamprecht
  2021-04-15 16:38     ` Moula BADJI
  2021-04-16  7:36     ` alexandre derumier
  2 siblings, 0 replies; 40+ messages in thread
From: Thomas Lamprecht @ 2021-04-15 14:36 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Grünbichler,
	alexandre derumier

On 15.04.21 16:32, Fabian Grünbichler wrote:
>> I'll be able to test it soon as I'll need to migrate 200-300 vms between 
>> 2 datacenter soon.
> looking forward to feedback :) you'll need to put the 
> proxmox-websocket-tunnel binary into $PATH of pveproxy/qm, after 
> building it with 'cargo build'.

Just to be sure: ensure you build it with `cargo build --release` or else 
it
may be quite slow




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [pve-devel] [RFC qemu-server++ 0/22] remote migration
  2021-04-15 14:32   ` Fabian Grünbichler
  2021-04-15 14:36     ` Thomas Lamprecht
@ 2021-04-15 16:38     ` Moula BADJI
  2021-05-05  6:02       ` aderumier
  2021-04-16  7:36     ` alexandre derumier
  2 siblings, 1 reply; 40+ messages in thread
From: Moula BADJI @ 2021-04-15 16:38 UTC (permalink / raw)
  To: pve-devel

It is a good initiative. I just want to add an idea on the migration of 
vms, with GPUs. I use K8S (K8s Workers with Nvidia GPU) on VMS deployed 
on PVE with Maas + Juju. But no possibility to migrate the vm even if 
the GPUs are identical on two different nodes. Maybe add a constraint 
with the fact of tagging the vms on MAAS like VM + GPU and check that 
the GPU is identical ... Thank you.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [pve-devel] [RFC qemu-server++ 0/22] remote migration
  2021-04-15 14:32   ` Fabian Grünbichler
  2021-04-15 14:36     ` Thomas Lamprecht
  2021-04-15 16:38     ` Moula BADJI
@ 2021-04-16  7:36     ` alexandre derumier
  2 siblings, 0 replies; 40+ messages in thread
From: alexandre derumier @ 2021-04-16  7:36 UTC (permalink / raw)
  To: Fabian Grünbichler, Proxmox VE development discussion

>>looking forward to feedback :) you'll need to put the 
>>proxmox-websocket-tunnel binary into $PATH of pveproxy/qm, after 
>>building it with 'cargo build'.

oh, I didn't known that the rust version was already ready. great :)
I'll try to do my first rust build :) (I really need to learn it, seem to be a great language)


>>if your inter-DC link is fast enough, you'll likely be hit by the 
>>pveproxy bottleneck. it would still be interesting to get some 
>>real-world numbers, I haven't tested with baremetal and fast storage 
>>yet.

yes, I have enough bandwidth (25gbs), so no problem here. I'll try with && without any storage to compare.
(as I'm using ceph with nvme, I think it'll be the bottleneck with qemu mirror)

>>please be aware that this is very much experimental code still!

yes, sure , no problem ;)


On 15/04/2021 16:32, Fabian Grünbichler wrote:
> On April 15, 2021 4:04 pm, alexandre derumier wrote:
>> Hi,
>>
>> thanks for working on this !
>>
>> I'll be able to test it soon as I'll need to migrate 200-300 vms between
>> 2 datacenter soon.
> looking forward to feedback :) you'll need to put the
> proxmox-websocket-tunnel binary into $PATH of pveproxy/qm, after
> building it with 'cargo build'.
>
> if your inter-DC link is fast enough, you'll likely be hit by the
> pveproxy bottleneck. it would still be interesting to get some
> real-world numbers, I haven't tested with baremetal and fast storage
> yet.
>
> please be aware that this is very much experimental code still!
>
>> I think it could be great to add optionnal "tag" option to targetbridge,
>> as it could be different on target cluster.
> hmm, we could have another (optional) map for VLAN tags? since tags and
> bridges are not one entity (you can have on interface on bridge A with
> tag X, and another interface on bridge A with tag Y, and those need to
> be mapped to bridge B with tag P and bridge B with tag Q, for example).
>
>> Also, we should transfert vm firewall config.
> yes, that's definitely true. another source of potential
> mismatches/things to check before migrating (security groups/aliases!)
>
>> On 13/04/2021 14:16, Fabian Grünbichler wrote:
>>> this series adds remote migration for VMs. there's still plenty of
>>> TODOs/FIXMEs/stuff that requires discussion, hence the RFC. live
>>> migration with NBD and storage-migrated disks should work already.
>>>
>>> the performance bottle neck (~190MB/s on loopback) for the websocket
>>> connection seems to be in pveproxy at the moment - the rust code should
>>> manage about 700MB/s.
>>>
>>> overview over affected repos and changes, see individual patches for
>>> more details.
>>>
>>> proxmox:
>>>
>>> some compatible changes to make websocket code usable for client-side
>>> connections, required by proxmox-websocket-tunnel
>>>
>>> proxmox-websocket-tunnel:
>>>
>>> new tunnel helper tool for forwarding commands and data over websocket
>>> connections, required by qemu-server on source side
>>> TODO: better error handling
>>> TODO: fingerprint checking/valid certs/..
>>> TODO: WS key generation
>>> TODO: decide on mask?
>>> TODO: investigate performance bottlenecks once PVE api server gets
>>> faster
>>>
>>> pve-access-control:
>>>
>>> new ticket type, required by qemu-server on target side
>>>
>>> pve-cluster:
>>>
>>> new remote.cfg and related helpers, required by qemu-server on source
>>> side
>>> TODO: ACLs, CLI, API for managing config
>>> TODO: handling of discovered nodes with valid certificates
>>> TODO: add additional information like default bwlimits, storage/bridge
>>> mappings
>>>
>>> pve-common:
>>>
>>> bridgepair format akin to storage pair, pve-bridge-id option, required
>>> by qemu-server
>>> TODO: adapt pve-container
>>>
>>> pve-guest-common:
>>>
>>> handle remote migration (no SSH) in AbstractMigrate,
>>> required by qemu-server
>>>
>>> pve-manager:
>>>
>>> new 'addr' endpoint for retrieving remote node IPs, required on target
>>> node
>>>
>>> pve-storage:
>>>
>>> extend 'pvesm import' to allow import from UNIX socket, required on
>>> target node by qemu-server
>>>
>>> qemu-server:
>>>
>>> some refactoring, new mtunnel endpoints, new remote_migration endpoints
>>> TODO: check remote ACLs
>>> TODO: handle pending changes and snapshots
>>> TODO: CLI for remote migration
>>> potential TODO: expose remote info via additional endpoints (resources? vmids?
>>> permissions? ...)
>>>
>>> as usual, some of the patches are best viewed with '-w', especially in
>>> qemu-server..
>>>
>>>
>>> _______________________________________________
>>> pve-devel mailing list
>>> pve-devel@lists.proxmox.com
>>> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
>>>
>>



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [pve-devel] [PATCH common 2/2] schema: add pve-bridge-id option/format/pair
  2021-04-13 12:16 ` [pve-devel] [PATCH common 2/2] schema: add pve-bridge-id option/format/pair Fabian Grünbichler
@ 2021-04-16  9:53   ` Thomas Lamprecht
  2021-04-16 10:10     ` Fabian Grünbichler
  0 siblings, 1 reply; 40+ messages in thread
From: Thomas Lamprecht @ 2021-04-16  9:53 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Grünbichler

On 13.04.21 14:16, Fabian Grünbichler wrote:
> for re-use in qemu-server/pve-container, which already have this option
> duplicated. the '-pair' is needed for remote migration, but can also be
> a nice addition to regular intra-cluster migration to lift the
> restriction of having identically named bridges.
> 

looks OK, one naming issue inline

> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
>  src/PVE/JSONSchema.pm | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/src/PVE/JSONSchema.pm b/src/PVE/JSONSchema.pm
> index f2ddb50..bf30b33 100644
> --- a/src/PVE/JSONSchema.pm
> +++ b/src/PVE/JSONSchema.pm
> @@ -82,6 +82,12 @@ register_standard_option('pve-storage-id', {
>      type => 'string', format => 'pve-storage-id',
>  });
>  
> +register_standard_option('pve-bridge-id', {
> +    description => "Bridge to attach guest network devices to.",
> +    type => 'string', format => 'pve-bridge-id',
> +    format_description => 'bridge',
> +});
> +
>  register_standard_option('pve-config-digest', {
>      description => 'Prevent changes if current configuration file has different SHA1 digest. This can be used to prevent concurrent modifications.',
>      type => 'string',
> @@ -193,6 +199,17 @@ sub parse_storage_id {
>      return parse_id($storeid, 'storage', $noerr);
>  }
>  
> +PVE::JSONSchema::register_format('pve-bridge-id', \&parse_bridge_id);
> +sub parse_bridge_id {
> +    my ($id, $noerr) = @_;
> +
> +    if ($id !~ m/^[-_.\w\d]+$/) {
> +	return undef if $noerr;
> +	die "invalid bridge ID '$id'\n";
> +    }
> +    return $id;
> +}
> +
>  PVE::JSONSchema::register_format('acme-plugin-id', \&parse_acme_plugin_id);
>  sub parse_acme_plugin_id {
>      my ($pluginid, $noerr) = @_;
> @@ -293,6 +310,14 @@ sub verify_storagepair {
>      my ($storagepair, $noerr) = @_;
>      return $verify_idpair->($storagepair, $noerr, 'pve-storage-id');
>  }
> +
> +# note: this only checks a single list entry
> +# when using a bridgepair-list map, you need to pass the full parameter to
> +# parse_idmap
> +register_format('bridgepair', \&verify_bridgepair);

pve-bridge-id vs. bridgepair seems slightly odd as syntax choice?

Why not `bridge-pair` or even `pve-bridge-pair`?

> +sub verify_bridgepair {
> +    my ($bridgepair, $noerr) = @_;
> +    return $verify_idpair->($bridgepair, $noerr, 'pve-bridge-id');
>  }
>  
>  register_format('mac-addr', \&pve_verify_mac_addr);
> 





^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [pve-devel] [PATCH common 2/2] schema: add pve-bridge-id option/format/pair
  2021-04-16  9:53   ` Thomas Lamprecht
@ 2021-04-16 10:10     ` Fabian Grünbichler
  0 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-16 10:10 UTC (permalink / raw)
  To: Proxmox VE development discussion, Thomas Lamprecht

On April 16, 2021 11:53 am, Thomas Lamprecht wrote:
> On 13.04.21 14:16, Fabian Grünbichler wrote:
>> for re-use in qemu-server/pve-container, which already have this option
>> duplicated. the '-pair' is needed for remote migration, but can also be
>> a nice addition to regular intra-cluster migration to lift the
>> restriction of having identically named bridges.
>> 
> 
> looks OK, one naming issue inline
> 
>> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
>> ---
>>  src/PVE/JSONSchema.pm | 25 +++++++++++++++++++++++++
>>  1 file changed, 25 insertions(+)
>> 
>> diff --git a/src/PVE/JSONSchema.pm b/src/PVE/JSONSchema.pm
>> index f2ddb50..bf30b33 100644
>> --- a/src/PVE/JSONSchema.pm
>> +++ b/src/PVE/JSONSchema.pm
>> @@ -82,6 +82,12 @@ register_standard_option('pve-storage-id', {
>>      type => 'string', format => 'pve-storage-id',
>>  });
>>  
>> +register_standard_option('pve-bridge-id', {
>> +    description => "Bridge to attach guest network devices to.",
>> +    type => 'string', format => 'pve-bridge-id',
>> +    format_description => 'bridge',
>> +});
>> +
>>  register_standard_option('pve-config-digest', {
>>      description => 'Prevent changes if current configuration file has different SHA1 digest. This can be used to prevent concurrent modifications.',
>>      type => 'string',
>> @@ -193,6 +199,17 @@ sub parse_storage_id {
>>      return parse_id($storeid, 'storage', $noerr);
>>  }
>>  
>> +PVE::JSONSchema::register_format('pve-bridge-id', \&parse_bridge_id);
>> +sub parse_bridge_id {
>> +    my ($id, $noerr) = @_;
>> +
>> +    if ($id !~ m/^[-_.\w\d]+$/) {
>> +	return undef if $noerr;
>> +	die "invalid bridge ID '$id'\n";
>> +    }
>> +    return $id;
>> +}
>> +
>>  PVE::JSONSchema::register_format('acme-plugin-id', \&parse_acme_plugin_id);
>>  sub parse_acme_plugin_id {
>>      my ($pluginid, $noerr) = @_;
>> @@ -293,6 +310,14 @@ sub verify_storagepair {
>>      my ($storagepair, $noerr) = @_;
>>      return $verify_idpair->($storagepair, $noerr, 'pve-storage-id');
>>  }
>> +
>> +# note: this only checks a single list entry
>> +# when using a bridgepair-list map, you need to pass the full parameter to
>> +# parse_idmap
>> +register_format('bridgepair', \&verify_bridgepair);
> 
> pve-bridge-id vs. bridgepair seems slightly odd as syntax choice?
> 
> Why not `bridge-pair` or even `pve-bridge-pair`?

mainly because of the pre-existing 'storagepair', but I am fine with 
either (and also with changing storagepair - this series already touches 
so many repos that change can be mixed in without extra churn I think).

same for the API parameter(s) (I recycled the existing 'targetstorage', 
although I'd find 'target-FOO' more readable and I am also fine with 
using that variant for the new API call for all parameters)

> 
>> +sub verify_bridgepair {
>> +    my ($bridgepair, $noerr) = @_;
>> +    return $verify_idpair->($bridgepair, $noerr, 'pve-bridge-id');
>>  }
>>  
>>  register_format('mac-addr', \&pve_verify_mac_addr);
>> 
> 
> 




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [pve-devel] [PATCH manager] API: add node address(es) API endpoint
  2021-04-13 12:16 ` [pve-devel] [PATCH manager] API: add node address(es) API endpoint Fabian Grünbichler
@ 2021-04-16 10:17   ` Thomas Lamprecht
  2021-04-16 11:37     ` Fabian Grünbichler
  0 siblings, 1 reply; 40+ messages in thread
From: Thomas Lamprecht @ 2021-04-16 10:17 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Grünbichler

On 13.04.21 14:16, Fabian Grünbichler wrote:
> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
>  PVE/API2/Nodes.pm | 70 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 70 insertions(+)
> 
> diff --git a/PVE/API2/Nodes.pm b/PVE/API2/Nodes.pm
> index ba6621c6..b30d0739 100644
> --- a/PVE/API2/Nodes.pm
> +++ b/PVE/API2/Nodes.pm
> @@ -222,6 +222,7 @@ __PACKAGE__->register_method ({
>  	my ($param) = @_;
>  
>  	my $result = [
> +	    { name => 'addr' },
>  	    { name => 'aplinfo' },
>  	    { name => 'apt' },
>  	    { name => 'capabilities' },
> @@ -2183,6 +2184,75 @@ __PACKAGE__->register_method ({
>  	return undef;
>      }});
>  
> +__PACKAGE__->register_method ({
> +    name => 'get_node_addr',
> +    path => 'addr',

hmm, not so sure if this the best path choice, if network wouldn't be templated by
the {iface} param that would be a better choice to avoid crowding here.

I pondered over moving that one a level deeper to network/if/{name} for a 
major releas,
and mode some things like netstat, possible even dns, under there - but it's work and
the gain was rather small - with this call which could fit there ROI would be slightly
bigger, but not too sure if worth it, just throwing out the idea there.

Besides that, the name could be slightly more descriptive `ip-addr` or even 
ip-addresses`?

> +    method => 'GET',
> +    proxyto => 'node',
> +    permissions => {
> +	check => ['perm', '/', [ 'Sys.Audit' ]],

why not `/nodes/{node}` ?

> +    },
> +    description => "Get the content of /etc/hosts.",

above is wrong? probably left-over from copying?

> +    parameters => {
> +	additionalProperties => 0,
> +	properties => {
> +	    node => get_standard_option('pve-node'),
> +	    cidr => {
> +		type => 'string',
> +		format => 'CIDR',
> +		format_description => 'CIDR',
> +		description => 'Extra network for which to retrieve local address(es).',
> +		optional => 1,
> +	    },
> +	},
> +    },
> +    returns => {
> +	type => 'object',
> +	properties => {
> +	    default => {
> +		type => 'string',
> +		description => 'Default node IP.',
> +		format => 'ip',
> +	    },
> +	    migration => {
> +		type => 'array',
> +		items => {
> +		    type => 'string',
> +		    description => 'Migration network IP(s).',
> +		    format => 'ip',
> +		},
> +		optional => 1,
> +	    },
> +	    extra => {
> +		type => 'array',
> +		items => {
> +		    type => 'string',
> +		    description => 'Extra network IP(s).',
> +		    format => 'ip',
> +		},
> +		optional => 1,
> +	    },
> +	},
> +    },
> +    code => sub {
> +	my ($param) = @_;
> +
> +	my $data = {};
> +
> +	my $default = PVE::Cluster::remote_node_ip($param->{node});
> +
> +	my $dc_conf = cfs_read_file('datacenter.cfg');
> +	my $migration = $dc_conf->{migration}->{network};
> +
> +	$data->{default} = $default if defined($default);
> +	$data->{migration} = PVE::Network::get_local_ip_from_cidr($migration, 1)
> +	    if $migration;

style nit, probably opinionated and thus really no hard feelings, but maybe the
following is lightly more legible due to declaration being nearer to usage:

if (my $default = PVE::Cluster::remote_node_ip($param->{node}) {
    $data->{default} = $default;
}

my $dc_conf = cfs_read_file('datacenter.cfg');
if (my $migration = $dc_conf->{migration}->{network}) {
    $data->{migration} = PVE::Network::get_local_ip_from_cidr($migration, 1);
}

if (my $cidr = $param->{cidr}) {
    $data->{extra} = PVE::Network::get_local_ip_from_cidr($param->{cidr}, 1)
}

> +	$data->{extra} = PVE::Network::get_local_ip_from_cidr($param->{cidr}, 1)
> +	    if $param->{cidr};
> +
> +	return $data;
> +    }});
> +
>  # bash completion helper
>  
>  sub complete_templet_repo {
> 





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] applied: [PATCH storage] import: allow import from UNIX socket
  2021-04-13 12:16 ` [pve-devel] [PATCH storage] import: allow import from UNIX socket Fabian Grünbichler
@ 2021-04-16 10:24   ` Thomas Lamprecht
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Lamprecht @ 2021-04-16 10:24 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Grünbichler

On 13.04.21 14:16, Fabian Grünbichler wrote:
> this allows forwarding over websockets without requiring a (free) port.
> 
> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
>  PVE/CLI/pvesm.pm | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
>

applied, thanks!




^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] applied: [PATCH common 1/2] schema: pull out abstract 'id-pair' verifier
  2021-04-13 12:16 ` [pve-devel] [PATCH common 1/2] schema: pull out abstract 'id-pair' verifier Fabian Grünbichler
@ 2021-04-16 10:24   ` Thomas Lamprecht
  2021-04-19  8:43     ` [pve-devel] [PATCH common] fixup: remove double braces Stefan Reiter
  0 siblings, 1 reply; 40+ messages in thread
From: Thomas Lamprecht @ 2021-04-16 10:24 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Grünbichler

On 13.04.21 14:16, Fabian Grünbichler wrote:
> we'll need another one for guest bridge IDs
> 
> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
>  src/PVE/JSONSchema.pm | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 
>

applied, thanks!




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [pve-devel] [PATCH manager] API: add node address(es) API endpoint
  2021-04-16 10:17   ` Thomas Lamprecht
@ 2021-04-16 11:37     ` Fabian Grünbichler
  0 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-16 11:37 UTC (permalink / raw)
  To: Proxmox VE development discussion, Thomas Lamprecht

On April 16, 2021 12:17 pm, Thomas Lamprecht wrote:
> On 13.04.21 14:16, Fabian Grünbichler wrote:
>> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
>> ---
>>  PVE/API2/Nodes.pm | 70 +++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 70 insertions(+)
>> 
>> diff --git a/PVE/API2/Nodes.pm b/PVE/API2/Nodes.pm
>> index ba6621c6..b30d0739 100644
>> --- a/PVE/API2/Nodes.pm
>> +++ b/PVE/API2/Nodes.pm
>> @@ -222,6 +222,7 @@ __PACKAGE__->register_method ({
>>  	my ($param) = @_;
>>  
>>  	my $result = [
>> +	    { name => 'addr' },
>>  	    { name => 'aplinfo' },
>>  	    { name => 'apt' },
>>  	    { name => 'capabilities' },
>> @@ -2183,6 +2184,75 @@ __PACKAGE__->register_method ({
>>  	return undef;
>>      }});
>>  
>> +__PACKAGE__->register_method ({
>> +    name => 'get_node_addr',
>> +    path => 'addr',
> 
> hmm, not so sure if this the best path choice, if network wouldn't be templated by
> the {iface} param that would be a better choice to avoid crowding here.
> 
> I pondered over moving that one a level deeper to network/if/{name} for a 
> major releas,
> and mode some things like netstat, possible even dns, under there - but it's work and
> the gain was rather small - with this call which could fit there ROI would be slightly
> bigger, but not too sure if worth it, just throwing out the idea there.

yeah, I would have liked to put it under network/, but didn't want to 
bother with the breaking change. with 7.0 it might be a good opportunity 
though.. maybe there is SDN stuff that could fit in as well?

> Besides that, the name could be slightly more descriptive `ip-addr` or even 
> ip-addresses`?

sure, could be done.

>> +    method => 'GET',
>> +    proxyto => 'node',
>> +    permissions => {
>> +	check => ['perm', '/', [ 'Sys.Audit' ]],
> 
> why not `/nodes/{node}` ?

not sure, but probably a good idea (also required to read the interface 
config which is about the same level of "secrecy" I'd say.

> 
>> +    },
>> +    description => "Get the content of /etc/hosts.",
> 
> above is wrong? probably left-over from copying?

yes.

> 
>> +    parameters => {
>> +	additionalProperties => 0,
>> +	properties => {
>> +	    node => get_standard_option('pve-node'),
>> +	    cidr => {
>> +		type => 'string',
>> +		format => 'CIDR',
>> +		format_description => 'CIDR',
>> +		description => 'Extra network for which to retrieve local address(es).',
>> +		optional => 1,
>> +	    },
>> +	},
>> +    },
>> +    returns => {
>> +	type => 'object',
>> +	properties => {
>> +	    default => {
>> +		type => 'string',
>> +		description => 'Default node IP.',
>> +		format => 'ip',
>> +	    },
>> +	    migration => {
>> +		type => 'array',
>> +		items => {
>> +		    type => 'string',
>> +		    description => 'Migration network IP(s).',
>> +		    format => 'ip',
>> +		},
>> +		optional => 1,
>> +	    },
>> +	    extra => {
>> +		type => 'array',
>> +		items => {
>> +		    type => 'string',
>> +		    description => 'Extra network IP(s).',
>> +		    format => 'ip',
>> +		},
>> +		optional => 1,
>> +	    },
>> +	},
>> +    },
>> +    code => sub {
>> +	my ($param) = @_;
>> +
>> +	my $data = {};
>> +
>> +	my $default = PVE::Cluster::remote_node_ip($param->{node});
>> +
>> +	my $dc_conf = cfs_read_file('datacenter.cfg');
>> +	my $migration = $dc_conf->{migration}->{network};
>> +
>> +	$data->{default} = $default if defined($default);
>> +	$data->{migration} = PVE::Network::get_local_ip_from_cidr($migration, 1)
>> +	    if $migration;
> 
> style nit, probably opinionated and thus really no hard feelings, but maybe the
> following is lightly more legible due to declaration being nearer to usage:
> 
> if (my $default = PVE::Cluster::remote_node_ip($param->{node}) {
>     $data->{default} = $default;
> }
> 
> my $dc_conf = cfs_read_file('datacenter.cfg');
> if (my $migration = $dc_conf->{migration}->{network}) {
>     $data->{migration} = PVE::Network::get_local_ip_from_cidr($migration, 1);
> }
> 
> if (my $cidr = $param->{cidr}) {
>     $data->{extra} = PVE::Network::get_local_ip_from_cidr($param->{cidr}, 1)
> }

LGTM

> 
>> +	$data->{extra} = PVE::Network::get_local_ip_from_cidr($param->{cidr}, 1)
>> +	    if $param->{cidr};
>> +
>> +	return $data;
>> +    }});
>> +
>>  # bash completion helper
>>  
>>  sub complete_templet_repo {
>> 
> 
> 




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [pve-devel] [PATCH cluster 2/4] add get_remote_info
  2021-04-13 12:16 ` [pve-devel] [PATCH cluster 2/4] add get_remote_info Fabian Grünbichler
@ 2021-04-18 17:07   ` Thomas Lamprecht
  2021-04-19  7:48     ` Fabian Grünbichler
  0 siblings, 1 reply; 40+ messages in thread
From: Thomas Lamprecht @ 2021-04-18 17:07 UTC (permalink / raw)
  To: Proxmox VE development discussion, Fabian Grünbichler

On 13.04.21 14:16, Fabian Grünbichler wrote:
> as a unified helper for talking to a remote node. if the requested node
> has an entry in the remote config, the information from that entry is
> used.  else, the first locally defined node of the requested cluster is
> used as proxy.
> 
> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
> ---
>  data/PVE/RemoteConfig.pm | 55 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 55 insertions(+)
> 
> diff --git a/data/PVE/RemoteConfig.pm b/data/PVE/RemoteConfig.pm
> index 23274de..7c395ba 100644
> --- a/data/PVE/RemoteConfig.pm
> +++ b/data/PVE/RemoteConfig.pm
> @@ -3,6 +3,7 @@ package PVE::RemoteConfig;
>  use strict;
>  use warnings;
>  
> +use PVE::APIClient::LWP;
>  use PVE::Cluster qw(cfs_register_file cfs_read_file cfs_write_file cfs_lock_file);
>  use PVE::JSONSchema qw(get_standard_option);
>  use PVE::Tools;
> @@ -158,6 +159,60 @@ sub lock {
>      }
>  }
>  
> +# will attempt to connect with node's locally defined endpoint if possible
> +sub get_remote_info {
> +    my ($self, $cluster, $node, $network_cidr) = @_;
> +
> +    my $cluster_info = $self->{ids}->{$cluster};
> +    die "Remote cluster '$cluster' is not defined!\n"
> +	if !defined($cluster_info) || $cluster_info->{type} ne 'pvecluster';
> +
> +    my $host = $node;
> +
> +    # fallback to random node/endpoint if $node is not locally defined
> +    if (!$cluster_info->{nodes}->{$node}) {
> +	my @defined_nodes = keys %{$cluster_info->{nodes}};
> +	$host = $defined_nodes[0];
> +    }
> +
> +    my $api_node = $self->{ids}->{$host};
> +
> +    my $api_token = $cluster_info->{token} // $api_node->{token};
> +
> +    my $conn_args = {
> +	username => 'root@pam',
> +	protocol => 'https',
> +	host => $api_node->{endpoint},
> +	apitoken => $api_token,
> +	port => 8006,
> +    };
> +
> +    if (my $fp = $api_node->{fingerprint}) {
> +	$conn_args->{cached_fingerprints} = { uc($fp) => 1 };
> +    } else {
> +	# FIXME add proper parameter to APIClient

that should now work out of the box? I.e., if no FP is passed we default to
verify_hostname = 1, and if verify_hostname is true we trust what openssl thinks
about the validity of the connection.

> +	die "IMPLEMENT ME";
> +	my $ssl_opts = {
> +	    verify_hostname => 1,
> +#	    SSL_ca_path => '/etc/ssl/certs',
> +	    SSL_verify_callback => 1,
> +	};
> +    }
> +
> +    print "Establishing API connection with cluster '$cluster' node '$host'\n";
> +
> +    my $conn = PVE::APIClient::LWP->new(%$conn_args);
> +
> +
> +    my $args = {};
> +    $args->{cidr} = $network_cidr if $network_cidr;
> +
> +    print "Request IP information of node '$node'\n";
> +    my $res = $conn->get("/nodes/$node/addr", $args);
> +
> +    return ($res, $conn_args);
> +}
> +
>  package PVE::RemoteConfig::Cluster;
>  
>  use PVE::RemoteConfig;
> 





^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [pve-devel] [PATCH cluster 2/4] add get_remote_info
  2021-04-18 17:07   ` Thomas Lamprecht
@ 2021-04-19  7:48     ` Fabian Grünbichler
  0 siblings, 0 replies; 40+ messages in thread
From: Fabian Grünbichler @ 2021-04-19  7:48 UTC (permalink / raw)
  To: Proxmox VE development discussion, Thomas Lamprecht

On April 18, 2021 7:07 pm, Thomas Lamprecht wrote:
> On 13.04.21 14:16, Fabian Grünbichler wrote:
>> as a unified helper for talking to a remote node. if the requested node
>> has an entry in the remote config, the information from that entry is
>> used.  else, the first locally defined node of the requested cluster is
>> used as proxy.
>> 
>> Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
>> ---
>>  data/PVE/RemoteConfig.pm | 55 ++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 55 insertions(+)
>> 
>> diff --git a/data/PVE/RemoteConfig.pm b/data/PVE/RemoteConfig.pm
>> index 23274de..7c395ba 100644
>> --- a/data/PVE/RemoteConfig.pm
>> +++ b/data/PVE/RemoteConfig.pm
>> @@ -3,6 +3,7 @@ package PVE::RemoteConfig;
>>  use strict;
>>  use warnings;
>>  
>> +use PVE::APIClient::LWP;
>>  use PVE::Cluster qw(cfs_register_file cfs_read_file cfs_write_file cfs_lock_file);
>>  use PVE::JSONSchema qw(get_standard_option);
>>  use PVE::Tools;
>> @@ -158,6 +159,60 @@ sub lock {
>>      }
>>  }
>>  
>> +# will attempt to connect with node's locally defined endpoint if possible
>> +sub get_remote_info {
>> +    my ($self, $cluster, $node, $network_cidr) = @_;
>> +
>> +    my $cluster_info = $self->{ids}->{$cluster};
>> +    die "Remote cluster '$cluster' is not defined!\n"
>> +	if !defined($cluster_info) || $cluster_info->{type} ne 'pvecluster';
>> +
>> +    my $host = $node;
>> +
>> +    # fallback to random node/endpoint if $node is not locally defined
>> +    if (!$cluster_info->{nodes}->{$node}) {
>> +	my @defined_nodes = keys %{$cluster_info->{nodes}};
>> +	$host = $defined_nodes[0];
>> +    }
>> +
>> +    my $api_node = $self->{ids}->{$host};
>> +
>> +    my $api_token = $cluster_info->{token} // $api_node->{token};
>> +
>> +    my $conn_args = {
>> +	username => 'root@pam',
>> +	protocol => 'https',
>> +	host => $api_node->{endpoint},
>> +	apitoken => $api_token,
>> +	port => 8006,
>> +    };
>> +
>> +    if (my $fp = $api_node->{fingerprint}) {
>> +	$conn_args->{cached_fingerprints} = { uc($fp) => 1 };
>> +    } else {
>> +	# FIXME add proper parameter to APIClient
> 
> that should now work out of the box? I.e., if no FP is passed we default to
> verify_hostname = 1, and if verify_hostname is true we trust what openssl thinks
> about the validity of the connection.

I didn't test it (and the tunnel binary itself still lacks that 
functionality for sure), but that comment is leftover (only slightly 
moved/reworded) from last year's PoC, so it's possible that the LWP 
client handles this well nowadays :)

> 
>> +	die "IMPLEMENT ME";
>> +	my $ssl_opts = {
>> +	    verify_hostname => 1,
>> +#	    SSL_ca_path => '/etc/ssl/certs',
>> +	    SSL_verify_callback => 1,
>> +	};
>> +    }
>> +
>> +    print "Establishing API connection with cluster '$cluster' node '$host'\n";
>> +
>> +    my $conn = PVE::APIClient::LWP->new(%$conn_args);
>> +
>> +
>> +    my $args = {};
>> +    $args->{cidr} = $network_cidr if $network_cidr;
>> +
>> +    print "Request IP information of node '$node'\n";
>> +    my $res = $conn->get("/nodes/$node/addr", $args);
>> +
>> +    return ($res, $conn_args);
>> +}
>> +
>>  package PVE::RemoteConfig::Cluster;
>>  
>>  use PVE::RemoteConfig;
>> 
> 
> 




^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] [PATCH common] fixup: remove double braces
  2021-04-16 10:24   ` [pve-devel] applied: " Thomas Lamprecht
@ 2021-04-19  8:43     ` Stefan Reiter
  2021-04-19  9:56       ` [pve-devel] applied: " Thomas Lamprecht
  0 siblings, 1 reply; 40+ messages in thread
From: Stefan Reiter @ 2021-04-19  8:43 UTC (permalink / raw)
  To: pve-devel; +Cc: t.lamprecht, f.gruenbichler

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---

either this or apply the second patch soon please ;)
https://lists.proxmox.com/pipermail/pve-devel/2021-April/047594.html

 src/PVE/JSONSchema.pm | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/PVE/JSONSchema.pm b/src/PVE/JSONSchema.pm
index f2ddb50..3febc1c 100644
--- a/src/PVE/JSONSchema.pm
+++ b/src/PVE/JSONSchema.pm
@@ -293,7 +293,6 @@ sub verify_storagepair {
     my ($storagepair, $noerr) = @_;
     return $verify_idpair->($storagepair, $noerr, 'pve-storage-id');
 }
-}
 
 register_format('mac-addr', \&pve_verify_mac_addr);
 sub pve_verify_mac_addr {
-- 
2.20.1





^ permalink raw reply	[flat|nested] 40+ messages in thread

* [pve-devel] applied: [PATCH common] fixup: remove double braces
  2021-04-19  8:43     ` [pve-devel] [PATCH common] fixup: remove double braces Stefan Reiter
@ 2021-04-19  9:56       ` Thomas Lamprecht
  0 siblings, 0 replies; 40+ messages in thread
From: Thomas Lamprecht @ 2021-04-19  9:56 UTC (permalink / raw)
  To: Stefan Reiter, pve-devel

On 19.04.21 10:43, Stefan Reiter wrote:
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
> 
> either this or apply the second patch soon please ;)
> https://lists.proxmox.com/pipermail/pve-devel/2021-April/047594.html
> 
>  src/PVE/JSONSchema.pm | 1 -
>  1 file changed, 1 deletion(-)
> 
>

applied, thanks!




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [pve-devel] [RFC qemu-server++ 0/22] remote migration
  2021-04-15 16:38     ` Moula BADJI
@ 2021-05-05  6:02       ` aderumier
  2021-05-05  9:22         ` Dominik Csapak
  0 siblings, 1 reply; 40+ messages in thread
From: aderumier @ 2021-05-05  6:02 UTC (permalink / raw)
  To: Proxmox VE development discussion

Hi Moula,

local device migration is not related to this remote migration serie,
but maybe some improvement could be done.

I'm think about usb device, where we could have the same device on
multiple hosts. (like a security dongle for example).

I think for usb we should be able to detach/migrate/rettach. (it should
work for stateless device, but not a usb drive for example)

Maybe for local nic pci passthroug too.

But for gpu I'm really not sure it's possible, if you have some data in
the local gpu memory, I don't see how it's possible to get it working.

Do you known an hypervisor with this kind of gpu migration
implementation ?





Le jeudi 15 avril 2021 à 18:38 +0200, Moula BADJI a écrit :
> It is a good initiative. I just want to add an idea on the migration
> of 
> vms, with GPUs. I use K8S (K8s Workers with Nvidia GPU) on VMS
> deployed 
> the GPU is identical ... Thank you.
> 
> _______________________________________________
> pve-devel mailing list
> pve-devel@lists.proxmox.com
> https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
> 





^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [pve-devel] [RFC qemu-server++ 0/22] remote migration
  2021-05-05  6:02       ` aderumier
@ 2021-05-05  9:22         ` Dominik Csapak
  0 siblings, 0 replies; 40+ messages in thread
From: Dominik Csapak @ 2021-05-05  9:22 UTC (permalink / raw)
  To: pve-devel

On 5/5/21 08:02, aderumier@odiso.com wrote:
> Hi Moula,
> 
> local device migration is not related to this remote migration serie,
> but maybe some improvement could be done.
> 
> I'm think about usb device, where we could have the same device on
> multiple hosts. (like a security dongle for example).
> 
> I think for usb we should be able to detach/migrate/rettach. (it should
> work for stateless device, but not a usb drive for example)
> 
> Maybe for local nic pci passthroug too.
> 
> But for gpu I'm really not sure it's possible, if you have some data in
> the local gpu memory, I don't see how it's possible to get it working.
> 
> Do you known an hypervisor with this kind of gpu migration
> implementation ?
> 

afaik, this is being worked on in qemu/kvm but needs hardware support,
i.e. you can dump the internal state of a supported pci device,
and migrate it to the target

will probably still take a while until normal (non big datacenter)
customers can use this...




^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2021-05-05  9:23 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-13 12:16 [pve-devel] [RFC qemu-server++ 0/22] remote migration Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH proxmox 1/2] websocket: make field public Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH proxmox 2/2] websocket: adapt for client connection Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH proxmox-websocket-tunnel 1/2] initial commit Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH proxmox-websocket-tunnel 2/2] add tunnel implementation Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH access-control 1/2] tickets: add tunnel ticket Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH access-control 2/2] ticket: normalize path for verification Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH cluster 1/4] remote.cfg: add new config file Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH cluster 2/4] add get_remote_info Fabian Grünbichler
2021-04-18 17:07   ` Thomas Lamprecht
2021-04-19  7:48     ` Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH cluster 3/4] remote: add option/completion Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH cluster 4/4] get_remote_info: also return FP if available Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH common 1/2] schema: pull out abstract 'id-pair' verifier Fabian Grünbichler
2021-04-16 10:24   ` [pve-devel] applied: " Thomas Lamprecht
2021-04-19  8:43     ` [pve-devel] [PATCH common] fixup: remove double braces Stefan Reiter
2021-04-19  9:56       ` [pve-devel] applied: " Thomas Lamprecht
2021-04-13 12:16 ` [pve-devel] [PATCH common 2/2] schema: add pve-bridge-id option/format/pair Fabian Grünbichler
2021-04-16  9:53   ` Thomas Lamprecht
2021-04-16 10:10     ` Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH guest-common] migrate: handle migration_network with remote migration Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH manager] API: add node address(es) API endpoint Fabian Grünbichler
2021-04-16 10:17   ` Thomas Lamprecht
2021-04-16 11:37     ` Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH storage] import: allow import from UNIX socket Fabian Grünbichler
2021-04-16 10:24   ` [pve-devel] applied: " Thomas Lamprecht
2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 1/7] migrate: factor out storage checks Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 2/7] refactor map_storage to map_id Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 3/7] schema: use pve-bridge-id Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 4/7] mtunnel: add API endpoints Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 5/7] migrate: refactor remote VM/tunnel start Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 6/7] migrate: add remote migration handling Fabian Grünbichler
2021-04-13 12:16 ` [pve-devel] [PATCH qemu-server 7/7] api: add remote migrate endpoint Fabian Grünbichler
2021-04-15 14:04 ` [pve-devel] [RFC qemu-server++ 0/22] remote migration alexandre derumier
2021-04-15 14:32   ` Fabian Grünbichler
2021-04-15 14:36     ` Thomas Lamprecht
2021-04-15 16:38     ` Moula BADJI
2021-05-05  6:02       ` aderumier
2021-05-05  9:22         ` Dominik Csapak
2021-04-16  7:36     ` alexandre derumier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox
Service provided by Proxmox Server Solutions GmbH | Privacy | Legal