[Ground-station] COBS Decoder: Human vs OpenAI GPT

Michelle Thompson mountain.michelle at gmail.com
Mon Dec 5 02:33:28 PST 2022


Thank you, Mark.

It’s posted to the ORI website as a blog entry, but there are no plans to
publish it elsewhere. We do not mind if it’s forwarded to Hack-a-Day.

I’ll be on travel through the 10th, so would you please take the lead on a
submission?

-Michelle

On Sun, Dec 4, 2022 at 22:43 Mark Whittington <markwhi at gmail.com> wrote:

> Michelle and Paul:
>
> This was a fascinating read! I have seen a lot of coverage of AI-generated
> code (mostly GitHub's Copilot) but this is the first test case that I've
> seen involving an HDL. Do you plan to publish this work somewhere other
> than this list?  Would you mind if I forward it to the editors at
> Hack-a-Day, or would you consider doing so yourselves?
>
> Thank you both for the write-up and to anyone else involved!
>
> On Sat, Dec 3, 2022, 1:31 PM Michelle Thompson via Ground-Station
> <ground-station at lists.openresearch.institute> wrote:
>
>> Greetings all,
>>
>> You know that ORI has been active in a variety of regulatory work for
>> several years now. One of those areas has been in AI/ML and the effect
>> on communications. We served on the US FCC TAC in 2022, presented on
>> the subject, and are working towards an AI/ML Handbook for Amateur
>> Radio.
>>
>> You may have seen news about OpenAI GPT, or ChatGPT.
>>
>> There's a lot of people testing this chatbot out for reverse
>> engineering and in writing software. There's been some successes in
>> cases where the model would have had access to code or documentation
>> about protocols, observations, or technical behavior in order to
>> train, and requests match up well with the models.
>>
>> For example, a friend asked the chatbot to create a Python script to
>> access the webcam on their computer, and they shared a still image
>> from the resulting feed. Another example from another friend resulted
>> in better reverse engineering results than they had gotten with either
>> IDA or Ghidra over many weeks.
>>
>> So, how does ChatGPT handle VHDL?
>>
>> We have something specific to compare against, since we just completed
>> a COBS decoder in VHDL for our open source HEO/GEO transponder work
>> for Haifuraiya.
>>
>> Let's start off simple. Here's the first dialogue with the chatbot,
>> and the results.
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-
>>
>> Abraxas3d: Design a COBS decoder in VHDL
>>
>> Sure, here's an example of a COBS decoder in VHDL:
>>
>>
>>
>> library IEEE;
>> use IEEE.STD_LOGIC_1164.ALL;
>>
>> entity COBS_decoder is
>>   Port (
>>     input_data : in std_logic_vector;
>>     decoded_data : out std_logic_vector
>>   );
>> end COBS_decoder;
>>
>> architecture Behavioral of COBS_decoder is
>>   -- Declare a variable to keep track of the current index
>>   -- in the input and output data streams
>>   variable current_index : integer := 1;
>>
>>   -- Declare a variable to store the length of the
>>   -- run of zeros in the input data
>>   variable zero_run_length : integer := 0;
>> begin
>>   -- Iterate over the input data
>>   for i in input_data'range loop
>>     -- If the current input value is a zero, increment
>>     -- the zero run length variable
>>     if input_data(i) = '0' then
>>       zero_run_length := zero_run_length + 1;
>>     -- Otherwise, the current input value is not a zero,
>>     -- so we've reached the end of the zero run
>>     else
>>       -- Set the decoded value at the current index to
>>       -- the length of the zero run
>>       decoded_data(current_index) :=
>> conv_std_logic_vector(zero_run_length, 8);
>>
>>       -- Increment the current index and reset the zero
>>       -- run length for the next iteration
>>       current_index := current_index + 1;
>>       zero_run_length := 0;
>>     end if;
>>   end loop;
>> end Behavioral;
>>
>> This code decodes the input data using the COBS (Consistent Overhead
>> Byte Stuffing) algorithm, which involves replacing runs of zeros in
>> the input data with a single byte that indicates the length of the
>> zero run. For example, the input '101001000' would be decoded as
>> '103101'.
>>
>> This decoder iterates over the input data and maintains a variable
>> zero_run_length that is incremented for each zero in the input data.
>> When a non-zero value is encountered, the length of the zero run is
>> written to the output data and the zero_run_length variable is reset.
>>
>> I hope this helps! Let me know if you have any other questions."
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-
>>
>> It knew what COBS meant. It produced code, with comments, and a brief
>> "design document", with examples.
>>
>> Our open source HEO transponder project uses AXI-based blocks in the
>> FPGA. So, let's add AXI to the request.
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-
>>
>> Abraxas3d: Design a COBS decoder with AXI interface in VHDL
>>
>> Sure, here's an example of a COBS decoder with an AXI interface in VHDL:
>>
>> library IEEE;
>> use IEEE.STD_LOGIC_1164.ALL;
>> use IEEE.NUMERIC_STD.ALL;
>>
>> -- AXI interface definitions
>> library unisim;
>> use unisim.vcomponents.all;
>>
>> entity COBS_decoder is
>>   Port (
>>     AXI_IN  : in  AXI4Stream;
>>     AXI_OUT : out AXI4Stream
>>   );
>> end COBS_decoder;
>>
>> architecture Behavioral of COBS_decoder is
>>   -- Declare a variable to keep track of the current index
>>   -- in the input and output data streams
>>   variable current_index : integer := 1;
>>
>>   -- Declare a variable to store the length of the
>>   -- run of zeros in the input data
>>   variable zero_run_length : integer := 0;
>> begin
>>   -- Iterate over the input data
>>   for i in AXI_IN.data'range loop
>>     -- If the current input value is a zero, increment
>>     -- the zero run length variable
>>     if AXI_IN.data(i) = '0' then
>>       zero_run_length := zero_run_length + 1;
>>     -- Otherwise, the current input value is not a zero,
>>     -- so we've reached the end of the zero run
>>     else
>>       -- Set the decoded value at the current index to
>>       -- the length of the zero run
>>       AXI_OUT.data(current_index) :=
>> conv_std_logic_vector(zero_run_length, 8);
>>
>>       -- Increment the current index and reset the zero
>>       -- run length for the next iteration
>>       current_index := current_index + 1;
>>       zero_run_length := 0;
>>     end if;
>>   end loop;
>>
>>   -- Set the valid and last flags in the output AXI stream
>>   AXI_OUT.valid <= '1';
>>   AXI_OUT.last  <= '1';
>> end Behavioral;
>>
>> This code is similar to the previous example, but it uses an AXI
>> (Advanced eXtensible Interface) interface for the input and output
>> data. The AXI interface is a standard interface used in many FPGA
>> designs, and it provides a way to stream data into and out of the
>> decoder.
>>
>> In this code, the input data is read from the AXI_IN.data port, and
>> the decoded output is written to the AXI_OUT.data port. The valid and
>> last flags are also set in the output AXI stream to indicate that the
>> data is valid and that it is the last data in the stream.
>>
>> I hope this helps! Let me know if you have any other questions.
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-
>>
>> It knows about the previous request and updated the design.
>>
>> Here's a review of the COBS implementation from Paul Williamson KB5MU.
>>
>> "I don’t know what this is, but it isn’t a COBS decoder with AXI
>> interfaces.
>>
>> It does seem at first glance to be some sort of thing with AXI
>> interfaces, but it does not implement an AXI interface at all. It just
>> relies on a (possibly imaginary) library type called AXI4Stream that
>> somehow magically does AXI interfaces, in a way that’s both
>> implausibly smart and unexplicably dumb. I think the classic phrase
>> “not even wrong” applies here.
>>
>> The AXI4Stream object apparently has a member .data that is an array
>> of bytes, which might make sense if the entire AXI transfer is being
>> treated as a single entity here. The code iterates through the input
>> .data using the ’range attribute, as if the AXI4Stream object has a
>> defined fixed transfer length.
>>
>> Which it can’t have, if it’s a general purpose AXI stream implementation.
>>
>> So, if this was ever anything real, it must have been a special
>> purpose AXI implementation with a fixed transfer length. Assuming that
>> the AXI4Stream .data actually works that way, the entity does
>> do some sort of transformation on the input data to create the output
>> data.
>>
>> We asked for that to be a COBS decoder, but the function that it
>> actually implements bears only the slightest resemblance to a COBS
>> decoder. What it actually does is output a stream of words that are
>> the lengths of the runs of zeroes in the input data. Nonzero data does
>> not get passed through at all.
>>
>> Instead, a run of nonzero bytes gets translated into a run of zeroes,
>> as if two adjacent nonzero bytes meant nothing but a zero-length run
>> of zeroes.
>>
>> Here’s an example of what this code seems to do:
>>   Input  .data:  1 2 3 4 0 0 0 0 1 2 3 4
>>   Output .data:  0 0 0 0         4 0 0 0
>>
>> That’s certainly not a COBS decoder, or even anything useful at all.
>>
>> Note that the output is variable length, but never longer than the input.
>> We had to assume that the AXI4Stream object had a fixed .data size to
>> get this far, but now it has to have a variable data size for the
>> output.
>>
>> Supposing we had successfully suspended disbelief up to here, we now
>> examine the final two lines of code in the Behavioral architecture. It
>> sets the .valid and .last members of the AXI_OUT port to ‘1’.
>>
>> That makes absolutely no sense if it’s supposed to encapsulate an
>> entire AXI stream transfer. Those signals only make sense if we’re
>> treating one word at a time, as you’d expect to see in a normal,
>> general-purpose AXI stream object.
>>
>> The extremely verbose commenting style in this code suggests that the
>> original source material was from an introductory tutorial. Maybe the
>> AI has been tuned to favor heavily commented code, and ends up picking
>> up partially completed junk from the middle of tutorials?
>> Anyhow, this is not an example of automated programming in response to
>> a high-level query. This is gibberish that succeeds only in having a
>> superficial resemblance to the code requested."
>>
>> So, does this work?
>>
>> Let's try it out in the lab!
>>
>> The first thing we notice is that there is an obvious syntax error
>> with a missing semicolon. Aside from that, there is no AXI4Stream port
>> type. It's not recognized. Vivado appears to be disappointed in this
>> file, and wants little to do with this code. This is where we should
>> stop, but I spent some time looking to see if the ports were at least
>> close to being declared correctly.
>>
>> The model appears to have taken some custom or maybe enumerated type
>> from somewhere, and come up with a custom AXI4Stream type. This is not
>> recognized by the unisim library right off the bat, and that's the
>> sort of thing that automated code writing needs to get right.
>>
>> Let's look at the code written by humans from ORI.
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-
>>
>>
>> ----------------------------------------------------------------------------------
>> -- Company: Open Research Institute, Inc.
>> -- Engineer: Skunkwrx, Abraxas3d
>> --
>> -- Design Name: COBS protocol decoder
>> -- Module Name: decoder - Behavioral
>> -- Project Name: Phase 4 "Haifuraiya"
>> -- Target Devices: 7000 Zynq
>> -- Tool Versions: 2021.1
>> -- Description: COBS protocol decoder.
>> --
>> https://en.wikipedia.org/wiki/Consistent_Overhead_Byte_Stuffing
>> --
>> -- Dependencies:
>> --
>> -- Additional Comments: This work is Open Source and licensed using
>> CERN OHL v2.0
>> --
>>
>> ----------------------------------------------------------------------------------
>>
>>
>> library IEEE;
>> use IEEE.STD_LOGIC_1164.ALL;
>>
>> use IEEE.NUMERIC_STD.ALL;
>>
>> --Entity Declaration
>> entity decoder is
>>     Port ( rst      : in STD_LOGIC;
>>            clk      : in STD_LOGIC;
>>            s_tdata  : in STD_LOGIC_VECTOR (7 downto 0);
>>            s_tlast  : in STD_LOGIC;
>>            s_tvalid : in STD_LOGIC;
>>            s_tready : out STD_LOGIC;
>>            m_tdata  : out STD_LOGIC_VECTOR (7 downto 0);
>>            m_tlast  : out STD_LOGIC;
>>            m_tvalid : out STD_LOGIC;
>>            m_tready : in STD_LOGIC);
>> end decoder;
>>
>> --Architecture
>> architecture Behavioral of decoder is
>>
>> -- internal copy of s_tdata
>>     signal input_data          : STD_LOGIC_VECTOR (7 downto 0);
>>
>>     -- s_tdata delayed by one clk where s_tvalid is high
>>     signal input_data_d        : STD_LOGIC_VECTOR (7 downto 0);
>>
>>     -- internal version of s_tlast
>>     -- not used for anything
>>     -- We create our own m_tlast from the COBS framing instead
>>     signal s_tlast_i           : STD_LOGIC;
>>
>>     -- internal version of input handshaking signal s_tvalid
>>     signal s_tvalid_i          : STD_LOGIC;
>>
>>     -- s_tvalid delayed by one clk
>>     -- thus, a version of s_tvalid aligned with input_data_d
>>     signal s_tvalid_i_d        : STD_LOGIC;
>>
>>     -- s_tvalid delayed by two clks
>>     -- thus, a version of s_tvalid aligned with output_data
>>     signal s_tvalid_i_d_d      : STD_LOGIC;
>>
>>     -- countdown of bytes in this sequence
>>     -- loads from input_data_d when it represents the frame count
>>     --   as determined by signal counter_load
>>     -- contains the count while m_tdata contains the implied 0 at
>>     --   the end of a sequence, or the next count that replaced it
>>     -- counts down thereafter, to contain 1 when the last non-zero
>>     --   byte of the sequence is on m_tdata
>>     -- allowed to count down to 0, but not beyond
>>     signal count               : STD_LOGIC_VECTOR (7 downto 0);
>>
>> -- enable to load count from input_data_d on this clk edge
>> -- two cases detected:
>> --    * first valid non-zero byte after a frame separator
>> --    * first valid byte after count is exhausted
>> -- allowed to be high for multiple cycles
>>     signal counter_load        : STD_LOGIC;
>>
>>     -- counter_load delayed by one clk where s_tvalid is high
>>     -- used to identify the first valid data byte of any sequence,
>>     --   for purposes of computing m_tvalid (via pre_tvalid)
>>     signal counter_load_d      : STD_LOGIC;
>>
>>     -- detection of a valid frame separator (zero) byte in input_data_d
>>     signal frame_sep           : STD_LOGIC;
>>
>>     -- frame_sep delayed by one clk where s_tvalid is high
>>     -- used to compute counter_load
>>     -- used to compute rising edge of pre_tvalid
>>     signal frame_sep_d         : STD_LOGIC;
>>
>>     -- frame_sep_d delayed by an additional clk (not depending on
>> s_tvalid)
>>     -- used to find the first non-zero byte of the new frame
>>     signal frame_sep_d_d       : STD_LOGIC;
>>
>>     -- move the frame_sep signal that occurred during m_tready low
>>     -- out to the first cycle when m_tready is high again
>>     signal use_saved_frame_sep : STD_LOGIC;
>>
>>     -- flag to remember that the frame count for this sequence was 255,
>>     --   to handle the special case that such a sequence does not have
>>     --   an implied zero byte at the end.
>>     -- set when loading count with 255
>>     -- cleared when the counter is reloaded with anything else
>>     signal case_255            : STD_LOGIC;
>>
>>     -- internal version of m_tdata output
>>     signal output_data         : STD_LOGIC_VECTOR (7 downto 0);
>>
>>     -- internal version of m_tlast output
>>     -- high when the last byte of a frame is valid on m_tdata
>>     signal m_tlast_i           : STD_LOGIC;
>>
>>     -- delayed versions of m_tlast
>>     signal m_tlast_i_d         : STD_LOGIC;
>>     signal m_tlast_i_d_d       : STD_LOGIC;
>>
>>     -- intermediate result for m_tvalid.
>>     -- high across all data bytes of each sequence on m_tdata
>>     -- does not go low for bytes on m_tdata corresponding to
>>     --   bytes invalidated by s_tvalid.
>>     signal pre_tvalid          : STD_LOGIC;
>>
>>     -- internal version of m_tvalid output.
>>     -- pre_tvalid with periods of low s_tvalid_d_d punched out
>>     signal m_tvalid_i          : STD_LOGIC;
>>
>>     -- internal version of m_tready input
>>     -- also the internal version of s_tready output
>>     -- passes through m_tready to s_tready with no clk delays
>>     signal m_tready_i          : STD_LOGIC;
>>
>>     -- constant byte value 0xFF, for comparison purposes
>>     signal all_ones            : STD_LOGIC_VECTOR(input_data'range) :=
>> (others => '1');
>>
>>     -- constant byte value 0x00, for comparison purposes
>>     signal all_zeros           : STD_LOGIC_VECTOR(input_data'range) :=
>> (others => '0');
>>
>> begin
>>
>>     -- asynchronous assignments
>>
>>     frame_sep <= '1' when input_data_d = all_zeros and s_tvalid_i_d = '1'
>>                 else '0';
>>
>>     m_tlast_i <= '1' when ((frame_sep = '1' and m_tvalid_i = '1' and
>> m_tready = '1'))
>>                 else '0';
>>
>>     counter_load <= '1' when (input_data_d /= all_zeros and
>> frame_sep_d = '1' and s_tvalid_i_d = '1')   -- start of frame
>>                           or (to_integer(unsigned(count)) = 1 and
>> s_tvalid_i_d = '1')   -- start of next sequence in frame
>>                 else '0';
>>
>>     m_tvalid_i <= '1' when ((pre_tvalid = '1' and s_tvalid_i_d_d = '1'
>>    -- usual case, if input_data was valid
>>                         and not (to_integer(unsigned(count)) = 1 and
>> s_tvalid_i_d = '0')) -- defer last byte; might be m_tlast
>>                       or (pre_tvalid = '1' and
>> to_integer(unsigned(count)) = 1
>>                         and s_tvalid_i_d = '1' and s_tvalid_i_d_d =
>> '0')) -- pick up that deferred last byte
>>                 else '0';
>>
>>
>>     s_tready <= m_tready_i;
>>     m_tdata <= output_data;
>>     input_data <= s_tdata;
>>     s_tvalid_i <= s_tvalid;
>>     m_tready_i <= m_tready;
>>     m_tvalid <= m_tvalid_i;
>>     m_tlast <= m_tlast_i;
>>
>> -- processes
>>
>>
>>
>>     set_case_255 : process (rst, clk)
>>     begin
>>         if rst = '1' then
>>             case_255 <= '0';
>>         elsif rising_edge(clk) and m_tready_i = '1' then
>>             if counter_load = '1' and input_data_d = all_ones then
>>                 case_255 <= '1';
>>             elsif counter_load = '1' and input_data_d /= all_ones then
>>                 case_255 <= '0';
>>             end if;
>>         end if;
>>     end process set_case_255;
>>
>>
>>
>>     delay_s_tvalid : process (rst, clk)
>>     begin
>>         if rst = '1' then
>>             s_tvalid_i_d <= '0';
>>             s_tvalid_i_d_d <= '0';
>>         elsif rising_edge(clk) and m_tready_i = '1' then
>>             s_tvalid_i_d <= s_tvalid_i;
>>             s_tvalid_i_d_d <= s_tvalid_i_d;
>>         end if;
>>     end process delay_s_tvalid;
>>
>>
>>
>>     create_pre_tvalid : process (rst, clk)
>>     begin
>>         if rst = '1' then
>>             counter_load_d <= '0';
>>             pre_tvalid <= '0';
>>         elsif rising_edge(clk) and m_tready_i = '1' then
>>             if s_tvalid_i_d = '1' then
>>                 counter_load_d <= counter_load;
>>                 if (frame_sep_d_d = '1' and frame_sep_d = '0')
>>    -- normal last byte of frame
>>                 or (counter_load_d = '1' and frame_sep_d = '0')
>>    -- normal first byte of a sequence
>>                 then
>>                     pre_tvalid <= '1';
>>                 end if;
>>             end if;
>>             if frame_sep = '1'
>>             then
>>                 pre_tvalid <= '0';
>>             end if;
>>             if counter_load = '1' and case_255 = '1' then
>>                 pre_tvalid <= '0';
>>             end if;
>>         end if;
>>     end process create_pre_tvalid;
>>
>>
>>
>>     delay_m_tlast_i : process (rst, clk)
>>     begin
>>         if rst = '1' then
>>             m_tlast_i_d <= '0';
>>             m_tlast_i_d_d <= '0';
>>         elsif rising_edge(clk) and m_tready_i = '1' then
>>             m_tlast_i_d <= m_tlast_i;
>>             m_tlast_i_d_d <= m_tlast_i_d;
>>         end if;
>>     end process delay_m_tlast_i;
>>
>>
>>
>>     set_counter : process (rst,clk)
>>     begin
>>         if rst = '1' then
>>             count <= (others => '0');
>>             frame_sep_d <= '0';
>>             frame_sep_d_d <= '0';
>>         elsif rising_edge(clk) and m_tready_i = '1' then
>>             frame_sep_d_d <= frame_sep_d;
>>             if s_tvalid_i_d = '1' then
>>                 frame_sep_d <= frame_sep;
>>                 if counter_load = '1' then
>>                     count <= input_data_d;
>>                 elsif count /= all_zeros
>>                 then
>>                     count <= STD_LOGIC_VECTOR(unsigned(count) - 1);
>>                 end if;
>>             end if;
>>         end if;
>>     end process set_counter;
>>
>>
>>
>>     create_output : process (rst, clk)
>>     begin
>>         if rst = '1' then
>>             output_data <= (others => '0');
>>         elsif rising_edge(clk) and m_tready_i = '1' then
>>             if counter_load = '1'
>>             then
>>                 output_data <= all_zeros;
>>             elsif s_tvalid_i_d = '1' then
>>                 output_data <= input_data_d;
>>             end if;
>>         end if;
>>     end process create_output;
>>
>>
>>
>>     selective_delay_of_input_data : process (rst,clk)
>>     begin
>>         if rst = '1' then
>>             input_data_d <= all_zeros;
>>         elsif rising_edge(clk) and m_tready_i = '1' then
>>             if s_tvalid_i = '1' then
>>                 input_data_d <= input_data;
>>             end if;
>>         end if;
>>     end process selective_delay_of_input_data;
>>
>>
>> end Behavioral;
>>
>> -=-=-=-=-=-=-=-=-=-=-=-=-=-
>>
>> This codebase doesn't include just the source code, but also a
>> testbench file, a design document, timing diagrams, a python notebook
>> that creates test vectors, and python code that compares results
>> produced by the testbench that uses the test vectors. None of that is
>> produced by ChatGPT.
>>
>> There is code in some languages that can be written by AI at this point.
>>
>> This is a situation similar to algorithmic musical composition.
>> Algorithms can come up with ideas, sketches, or traverse a search
>> space more quickly than a human. They can help a composer, but do not
>> replace one. Algorithms can produce some sparkling gems of ideas.
>> These are fresh cuts and views. This is mixed in with a lot of music
>> that is just plain wrong.
>>
>> If your programming job happens to be in a well-defined area that is
>> easily automated, then you have something to worry about today. All
>> jobs are a target for automation. Yes, all of them. As AI/ML becomes
>> more capable, it will automate even things like testbench writing (HDL
>> Coder already does something like this) and support code (like a
>> Python notebook that creates a solid set of test vectors for a
>> testbench) and design documents. All of us should be worrying about
>> some number of tomorrows in the future.
>>
>> That day is not yet "today" with Chat GPT, when it comes to writing VHDL.
>>
> --
-Michelle Thompson

"Potestatem obscuri lateris nescis."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openresearch.institute/pipermail/ground-station-openresearch.institute/attachments/20221205/9cb26903/attachment-0001.htm>


More information about the Ground-Station mailing list