[Ground-station] COBS Decoder: Human vs OpenAI GPT
Michelle Thompson
mountain.michelle at gmail.com
Mon Dec 5 02:33:28 PST 2022
Thank you, Mark.
It’s posted to the ORI website as a blog entry, but there are no plans to
publish it elsewhere. We do not mind if it’s forwarded to Hack-a-Day.
I’ll be on travel through the 10th, so would you please take the lead on a
submission?
-Michelle
On Sun, Dec 4, 2022 at 22:43 Mark Whittington <markwhi at gmail.com> wrote:
> Michelle and Paul:
>
> This was a fascinating read! I have seen a lot of coverage of AI-generated
> code (mostly GitHub's Copilot) but this is the first test case that I've
> seen involving an HDL. Do you plan to publish this work somewhere other
> than this list? Would you mind if I forward it to the editors at
> Hack-a-Day, or would you consider doing so yourselves?
>
> Thank you both for the write-up and to anyone else involved!
>
> On Sat, Dec 3, 2022, 1:31 PM Michelle Thompson via Ground-Station
> <ground-station at lists.openresearch.institute> wrote:
>
>> Greetings all,
>>
>> You know that ORI has been active in a variety of regulatory work for
>> several years now. One of those areas has been in AI/ML and the effect
>> on communications. We served on the US FCC TAC in 2022, presented on
>> the subject, and are working towards an AI/ML Handbook for Amateur
>> Radio.
>>
>> You may have seen news about OpenAI GPT, or ChatGPT.
>>
>> There's a lot of people testing this chatbot out for reverse
>> engineering and in writing software. There's been some successes in
>> cases where the model would have had access to code or documentation
>> about protocols, observations, or technical behavior in order to
>> train, and requests match up well with the models.
>>
>> For example, a friend asked the chatbot to create a Python script to
>> access the webcam on their computer, and they shared a still image
>> from the resulting feed. Another example from another friend resulted
>> in better reverse engineering results than they had gotten with either
>> IDA or Ghidra over many weeks.
>>
>> So, how does ChatGPT handle VHDL?
>>
>> We have something specific to compare against, since we just completed
>> a COBS decoder in VHDL for our open source HEO/GEO transponder work
>> for Haifuraiya.
>>
>> Let's start off simple. Here's the first dialogue with the chatbot,
>> and the results.
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-
>>
>> Abraxas3d: Design a COBS decoder in VHDL
>>
>> Sure, here's an example of a COBS decoder in VHDL:
>>
>>
>>
>> library IEEE;
>> use IEEE.STD_LOGIC_1164.ALL;
>>
>> entity COBS_decoder is
>> Port (
>> input_data : in std_logic_vector;
>> decoded_data : out std_logic_vector
>> );
>> end COBS_decoder;
>>
>> architecture Behavioral of COBS_decoder is
>> -- Declare a variable to keep track of the current index
>> -- in the input and output data streams
>> variable current_index : integer := 1;
>>
>> -- Declare a variable to store the length of the
>> -- run of zeros in the input data
>> variable zero_run_length : integer := 0;
>> begin
>> -- Iterate over the input data
>> for i in input_data'range loop
>> -- If the current input value is a zero, increment
>> -- the zero run length variable
>> if input_data(i) = '0' then
>> zero_run_length := zero_run_length + 1;
>> -- Otherwise, the current input value is not a zero,
>> -- so we've reached the end of the zero run
>> else
>> -- Set the decoded value at the current index to
>> -- the length of the zero run
>> decoded_data(current_index) :=
>> conv_std_logic_vector(zero_run_length, 8);
>>
>> -- Increment the current index and reset the zero
>> -- run length for the next iteration
>> current_index := current_index + 1;
>> zero_run_length := 0;
>> end if;
>> end loop;
>> end Behavioral;
>>
>> This code decodes the input data using the COBS (Consistent Overhead
>> Byte Stuffing) algorithm, which involves replacing runs of zeros in
>> the input data with a single byte that indicates the length of the
>> zero run. For example, the input '101001000' would be decoded as
>> '103101'.
>>
>> This decoder iterates over the input data and maintains a variable
>> zero_run_length that is incremented for each zero in the input data.
>> When a non-zero value is encountered, the length of the zero run is
>> written to the output data and the zero_run_length variable is reset.
>>
>> I hope this helps! Let me know if you have any other questions."
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-
>>
>> It knew what COBS meant. It produced code, with comments, and a brief
>> "design document", with examples.
>>
>> Our open source HEO transponder project uses AXI-based blocks in the
>> FPGA. So, let's add AXI to the request.
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-
>>
>> Abraxas3d: Design a COBS decoder with AXI interface in VHDL
>>
>> Sure, here's an example of a COBS decoder with an AXI interface in VHDL:
>>
>> library IEEE;
>> use IEEE.STD_LOGIC_1164.ALL;
>> use IEEE.NUMERIC_STD.ALL;
>>
>> -- AXI interface definitions
>> library unisim;
>> use unisim.vcomponents.all;
>>
>> entity COBS_decoder is
>> Port (
>> AXI_IN : in AXI4Stream;
>> AXI_OUT : out AXI4Stream
>> );
>> end COBS_decoder;
>>
>> architecture Behavioral of COBS_decoder is
>> -- Declare a variable to keep track of the current index
>> -- in the input and output data streams
>> variable current_index : integer := 1;
>>
>> -- Declare a variable to store the length of the
>> -- run of zeros in the input data
>> variable zero_run_length : integer := 0;
>> begin
>> -- Iterate over the input data
>> for i in AXI_IN.data'range loop
>> -- If the current input value is a zero, increment
>> -- the zero run length variable
>> if AXI_IN.data(i) = '0' then
>> zero_run_length := zero_run_length + 1;
>> -- Otherwise, the current input value is not a zero,
>> -- so we've reached the end of the zero run
>> else
>> -- Set the decoded value at the current index to
>> -- the length of the zero run
>> AXI_OUT.data(current_index) :=
>> conv_std_logic_vector(zero_run_length, 8);
>>
>> -- Increment the current index and reset the zero
>> -- run length for the next iteration
>> current_index := current_index + 1;
>> zero_run_length := 0;
>> end if;
>> end loop;
>>
>> -- Set the valid and last flags in the output AXI stream
>> AXI_OUT.valid <= '1';
>> AXI_OUT.last <= '1';
>> end Behavioral;
>>
>> This code is similar to the previous example, but it uses an AXI
>> (Advanced eXtensible Interface) interface for the input and output
>> data. The AXI interface is a standard interface used in many FPGA
>> designs, and it provides a way to stream data into and out of the
>> decoder.
>>
>> In this code, the input data is read from the AXI_IN.data port, and
>> the decoded output is written to the AXI_OUT.data port. The valid and
>> last flags are also set in the output AXI stream to indicate that the
>> data is valid and that it is the last data in the stream.
>>
>> I hope this helps! Let me know if you have any other questions.
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-
>>
>> It knows about the previous request and updated the design.
>>
>> Here's a review of the COBS implementation from Paul Williamson KB5MU.
>>
>> "I don’t know what this is, but it isn’t a COBS decoder with AXI
>> interfaces.
>>
>> It does seem at first glance to be some sort of thing with AXI
>> interfaces, but it does not implement an AXI interface at all. It just
>> relies on a (possibly imaginary) library type called AXI4Stream that
>> somehow magically does AXI interfaces, in a way that’s both
>> implausibly smart and unexplicably dumb. I think the classic phrase
>> “not even wrong” applies here.
>>
>> The AXI4Stream object apparently has a member .data that is an array
>> of bytes, which might make sense if the entire AXI transfer is being
>> treated as a single entity here. The code iterates through the input
>> .data using the ’range attribute, as if the AXI4Stream object has a
>> defined fixed transfer length.
>>
>> Which it can’t have, if it’s a general purpose AXI stream implementation.
>>
>> So, if this was ever anything real, it must have been a special
>> purpose AXI implementation with a fixed transfer length. Assuming that
>> the AXI4Stream .data actually works that way, the entity does
>> do some sort of transformation on the input data to create the output
>> data.
>>
>> We asked for that to be a COBS decoder, but the function that it
>> actually implements bears only the slightest resemblance to a COBS
>> decoder. What it actually does is output a stream of words that are
>> the lengths of the runs of zeroes in the input data. Nonzero data does
>> not get passed through at all.
>>
>> Instead, a run of nonzero bytes gets translated into a run of zeroes,
>> as if two adjacent nonzero bytes meant nothing but a zero-length run
>> of zeroes.
>>
>> Here’s an example of what this code seems to do:
>> Input .data: 1 2 3 4 0 0 0 0 1 2 3 4
>> Output .data: 0 0 0 0 4 0 0 0
>>
>> That’s certainly not a COBS decoder, or even anything useful at all.
>>
>> Note that the output is variable length, but never longer than the input.
>> We had to assume that the AXI4Stream object had a fixed .data size to
>> get this far, but now it has to have a variable data size for the
>> output.
>>
>> Supposing we had successfully suspended disbelief up to here, we now
>> examine the final two lines of code in the Behavioral architecture. It
>> sets the .valid and .last members of the AXI_OUT port to ‘1’.
>>
>> That makes absolutely no sense if it’s supposed to encapsulate an
>> entire AXI stream transfer. Those signals only make sense if we’re
>> treating one word at a time, as you’d expect to see in a normal,
>> general-purpose AXI stream object.
>>
>> The extremely verbose commenting style in this code suggests that the
>> original source material was from an introductory tutorial. Maybe the
>> AI has been tuned to favor heavily commented code, and ends up picking
>> up partially completed junk from the middle of tutorials?
>> Anyhow, this is not an example of automated programming in response to
>> a high-level query. This is gibberish that succeeds only in having a
>> superficial resemblance to the code requested."
>>
>> So, does this work?
>>
>> Let's try it out in the lab!
>>
>> The first thing we notice is that there is an obvious syntax error
>> with a missing semicolon. Aside from that, there is no AXI4Stream port
>> type. It's not recognized. Vivado appears to be disappointed in this
>> file, and wants little to do with this code. This is where we should
>> stop, but I spent some time looking to see if the ports were at least
>> close to being declared correctly.
>>
>> The model appears to have taken some custom or maybe enumerated type
>> from somewhere, and come up with a custom AXI4Stream type. This is not
>> recognized by the unisim library right off the bat, and that's the
>> sort of thing that automated code writing needs to get right.
>>
>> Let's look at the code written by humans from ORI.
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-
>>
>>
>> ----------------------------------------------------------------------------------
>> -- Company: Open Research Institute, Inc.
>> -- Engineer: Skunkwrx, Abraxas3d
>> --
>> -- Design Name: COBS protocol decoder
>> -- Module Name: decoder - Behavioral
>> -- Project Name: Phase 4 "Haifuraiya"
>> -- Target Devices: 7000 Zynq
>> -- Tool Versions: 2021.1
>> -- Description: COBS protocol decoder.
>> --
>> https://en.wikipedia.org/wiki/Consistent_Overhead_Byte_Stuffing
>> --
>> -- Dependencies:
>> --
>> -- Additional Comments: This work is Open Source and licensed using
>> CERN OHL v2.0
>> --
>>
>> ----------------------------------------------------------------------------------
>>
>>
>> library IEEE;
>> use IEEE.STD_LOGIC_1164.ALL;
>>
>> use IEEE.NUMERIC_STD.ALL;
>>
>> --Entity Declaration
>> entity decoder is
>> Port ( rst : in STD_LOGIC;
>> clk : in STD_LOGIC;
>> s_tdata : in STD_LOGIC_VECTOR (7 downto 0);
>> s_tlast : in STD_LOGIC;
>> s_tvalid : in STD_LOGIC;
>> s_tready : out STD_LOGIC;
>> m_tdata : out STD_LOGIC_VECTOR (7 downto 0);
>> m_tlast : out STD_LOGIC;
>> m_tvalid : out STD_LOGIC;
>> m_tready : in STD_LOGIC);
>> end decoder;
>>
>> --Architecture
>> architecture Behavioral of decoder is
>>
>> -- internal copy of s_tdata
>> signal input_data : STD_LOGIC_VECTOR (7 downto 0);
>>
>> -- s_tdata delayed by one clk where s_tvalid is high
>> signal input_data_d : STD_LOGIC_VECTOR (7 downto 0);
>>
>> -- internal version of s_tlast
>> -- not used for anything
>> -- We create our own m_tlast from the COBS framing instead
>> signal s_tlast_i : STD_LOGIC;
>>
>> -- internal version of input handshaking signal s_tvalid
>> signal s_tvalid_i : STD_LOGIC;
>>
>> -- s_tvalid delayed by one clk
>> -- thus, a version of s_tvalid aligned with input_data_d
>> signal s_tvalid_i_d : STD_LOGIC;
>>
>> -- s_tvalid delayed by two clks
>> -- thus, a version of s_tvalid aligned with output_data
>> signal s_tvalid_i_d_d : STD_LOGIC;
>>
>> -- countdown of bytes in this sequence
>> -- loads from input_data_d when it represents the frame count
>> -- as determined by signal counter_load
>> -- contains the count while m_tdata contains the implied 0 at
>> -- the end of a sequence, or the next count that replaced it
>> -- counts down thereafter, to contain 1 when the last non-zero
>> -- byte of the sequence is on m_tdata
>> -- allowed to count down to 0, but not beyond
>> signal count : STD_LOGIC_VECTOR (7 downto 0);
>>
>> -- enable to load count from input_data_d on this clk edge
>> -- two cases detected:
>> -- * first valid non-zero byte after a frame separator
>> -- * first valid byte after count is exhausted
>> -- allowed to be high for multiple cycles
>> signal counter_load : STD_LOGIC;
>>
>> -- counter_load delayed by one clk where s_tvalid is high
>> -- used to identify the first valid data byte of any sequence,
>> -- for purposes of computing m_tvalid (via pre_tvalid)
>> signal counter_load_d : STD_LOGIC;
>>
>> -- detection of a valid frame separator (zero) byte in input_data_d
>> signal frame_sep : STD_LOGIC;
>>
>> -- frame_sep delayed by one clk where s_tvalid is high
>> -- used to compute counter_load
>> -- used to compute rising edge of pre_tvalid
>> signal frame_sep_d : STD_LOGIC;
>>
>> -- frame_sep_d delayed by an additional clk (not depending on
>> s_tvalid)
>> -- used to find the first non-zero byte of the new frame
>> signal frame_sep_d_d : STD_LOGIC;
>>
>> -- move the frame_sep signal that occurred during m_tready low
>> -- out to the first cycle when m_tready is high again
>> signal use_saved_frame_sep : STD_LOGIC;
>>
>> -- flag to remember that the frame count for this sequence was 255,
>> -- to handle the special case that such a sequence does not have
>> -- an implied zero byte at the end.
>> -- set when loading count with 255
>> -- cleared when the counter is reloaded with anything else
>> signal case_255 : STD_LOGIC;
>>
>> -- internal version of m_tdata output
>> signal output_data : STD_LOGIC_VECTOR (7 downto 0);
>>
>> -- internal version of m_tlast output
>> -- high when the last byte of a frame is valid on m_tdata
>> signal m_tlast_i : STD_LOGIC;
>>
>> -- delayed versions of m_tlast
>> signal m_tlast_i_d : STD_LOGIC;
>> signal m_tlast_i_d_d : STD_LOGIC;
>>
>> -- intermediate result for m_tvalid.
>> -- high across all data bytes of each sequence on m_tdata
>> -- does not go low for bytes on m_tdata corresponding to
>> -- bytes invalidated by s_tvalid.
>> signal pre_tvalid : STD_LOGIC;
>>
>> -- internal version of m_tvalid output.
>> -- pre_tvalid with periods of low s_tvalid_d_d punched out
>> signal m_tvalid_i : STD_LOGIC;
>>
>> -- internal version of m_tready input
>> -- also the internal version of s_tready output
>> -- passes through m_tready to s_tready with no clk delays
>> signal m_tready_i : STD_LOGIC;
>>
>> -- constant byte value 0xFF, for comparison purposes
>> signal all_ones : STD_LOGIC_VECTOR(input_data'range) :=
>> (others => '1');
>>
>> -- constant byte value 0x00, for comparison purposes
>> signal all_zeros : STD_LOGIC_VECTOR(input_data'range) :=
>> (others => '0');
>>
>> begin
>>
>> -- asynchronous assignments
>>
>> frame_sep <= '1' when input_data_d = all_zeros and s_tvalid_i_d = '1'
>> else '0';
>>
>> m_tlast_i <= '1' when ((frame_sep = '1' and m_tvalid_i = '1' and
>> m_tready = '1'))
>> else '0';
>>
>> counter_load <= '1' when (input_data_d /= all_zeros and
>> frame_sep_d = '1' and s_tvalid_i_d = '1') -- start of frame
>> or (to_integer(unsigned(count)) = 1 and
>> s_tvalid_i_d = '1') -- start of next sequence in frame
>> else '0';
>>
>> m_tvalid_i <= '1' when ((pre_tvalid = '1' and s_tvalid_i_d_d = '1'
>> -- usual case, if input_data was valid
>> and not (to_integer(unsigned(count)) = 1 and
>> s_tvalid_i_d = '0')) -- defer last byte; might be m_tlast
>> or (pre_tvalid = '1' and
>> to_integer(unsigned(count)) = 1
>> and s_tvalid_i_d = '1' and s_tvalid_i_d_d =
>> '0')) -- pick up that deferred last byte
>> else '0';
>>
>>
>> s_tready <= m_tready_i;
>> m_tdata <= output_data;
>> input_data <= s_tdata;
>> s_tvalid_i <= s_tvalid;
>> m_tready_i <= m_tready;
>> m_tvalid <= m_tvalid_i;
>> m_tlast <= m_tlast_i;
>>
>> -- processes
>>
>>
>>
>> set_case_255 : process (rst, clk)
>> begin
>> if rst = '1' then
>> case_255 <= '0';
>> elsif rising_edge(clk) and m_tready_i = '1' then
>> if counter_load = '1' and input_data_d = all_ones then
>> case_255 <= '1';
>> elsif counter_load = '1' and input_data_d /= all_ones then
>> case_255 <= '0';
>> end if;
>> end if;
>> end process set_case_255;
>>
>>
>>
>> delay_s_tvalid : process (rst, clk)
>> begin
>> if rst = '1' then
>> s_tvalid_i_d <= '0';
>> s_tvalid_i_d_d <= '0';
>> elsif rising_edge(clk) and m_tready_i = '1' then
>> s_tvalid_i_d <= s_tvalid_i;
>> s_tvalid_i_d_d <= s_tvalid_i_d;
>> end if;
>> end process delay_s_tvalid;
>>
>>
>>
>> create_pre_tvalid : process (rst, clk)
>> begin
>> if rst = '1' then
>> counter_load_d <= '0';
>> pre_tvalid <= '0';
>> elsif rising_edge(clk) and m_tready_i = '1' then
>> if s_tvalid_i_d = '1' then
>> counter_load_d <= counter_load;
>> if (frame_sep_d_d = '1' and frame_sep_d = '0')
>> -- normal last byte of frame
>> or (counter_load_d = '1' and frame_sep_d = '0')
>> -- normal first byte of a sequence
>> then
>> pre_tvalid <= '1';
>> end if;
>> end if;
>> if frame_sep = '1'
>> then
>> pre_tvalid <= '0';
>> end if;
>> if counter_load = '1' and case_255 = '1' then
>> pre_tvalid <= '0';
>> end if;
>> end if;
>> end process create_pre_tvalid;
>>
>>
>>
>> delay_m_tlast_i : process (rst, clk)
>> begin
>> if rst = '1' then
>> m_tlast_i_d <= '0';
>> m_tlast_i_d_d <= '0';
>> elsif rising_edge(clk) and m_tready_i = '1' then
>> m_tlast_i_d <= m_tlast_i;
>> m_tlast_i_d_d <= m_tlast_i_d;
>> end if;
>> end process delay_m_tlast_i;
>>
>>
>>
>> set_counter : process (rst,clk)
>> begin
>> if rst = '1' then
>> count <= (others => '0');
>> frame_sep_d <= '0';
>> frame_sep_d_d <= '0';
>> elsif rising_edge(clk) and m_tready_i = '1' then
>> frame_sep_d_d <= frame_sep_d;
>> if s_tvalid_i_d = '1' then
>> frame_sep_d <= frame_sep;
>> if counter_load = '1' then
>> count <= input_data_d;
>> elsif count /= all_zeros
>> then
>> count <= STD_LOGIC_VECTOR(unsigned(count) - 1);
>> end if;
>> end if;
>> end if;
>> end process set_counter;
>>
>>
>>
>> create_output : process (rst, clk)
>> begin
>> if rst = '1' then
>> output_data <= (others => '0');
>> elsif rising_edge(clk) and m_tready_i = '1' then
>> if counter_load = '1'
>> then
>> output_data <= all_zeros;
>> elsif s_tvalid_i_d = '1' then
>> output_data <= input_data_d;
>> end if;
>> end if;
>> end process create_output;
>>
>>
>>
>> selective_delay_of_input_data : process (rst,clk)
>> begin
>> if rst = '1' then
>> input_data_d <= all_zeros;
>> elsif rising_edge(clk) and m_tready_i = '1' then
>> if s_tvalid_i = '1' then
>> input_data_d <= input_data;
>> end if;
>> end if;
>> end process selective_delay_of_input_data;
>>
>>
>> end Behavioral;
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-
>>
>> This codebase doesn't include just the source code, but also a
>> testbench file, a design document, timing diagrams, a python notebook
>> that creates test vectors, and python code that compares results
>> produced by the testbench that uses the test vectors. None of that is
>> produced by ChatGPT.
>>
>> There is code in some languages that can be written by AI at this point.
>>
>> This is a situation similar to algorithmic musical composition.
>> Algorithms can come up with ideas, sketches, or traverse a search
>> space more quickly than a human. They can help a composer, but do not
>> replace one. Algorithms can produce some sparkling gems of ideas.
>> These are fresh cuts and views. This is mixed in with a lot of music
>> that is just plain wrong.
>>
>> If your programming job happens to be in a well-defined area that is
>> easily automated, then you have something to worry about today. All
>> jobs are a target for automation. Yes, all of them. As AI/ML becomes
>> more capable, it will automate even things like testbench writing (HDL
>> Coder already does something like this) and support code (like a
>> Python notebook that creates a solid set of test vectors for a
>> testbench) and design documents. All of us should be worrying about
>> some number of tomorrows in the future.
>>
>> That day is not yet "today" with Chat GPT, when it comes to writing VHDL.
>>
> --
-Michelle Thompson
"Potestatem obscuri lateris nescis."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openresearch.institute/pipermail/ground-station-openresearch.institute/attachments/20221205/9cb26903/attachment-0001.htm>
More information about the Ground-Station
mailing list