PaperHub
6.0
/10
Poster3 位审稿人
最低5最高7标准差0.8
6
7
5
2.7
置信度
正确性3.0
贡献度3.0
表达3.0
NeurIPS 2024

Expert-level protocol translation for self-driving labs

OpenReviewPDF
提交: 2024-04-23更新: 2024-11-06

摘要

关键词
Self-Driving LaboratoriesDomain-Specific LanguageStructural RepresentationKnowledge Externalization

评审与讨论

审稿意见
6

This paper proposes an automated protocol translation framework, which takes natural language descriptions designed for human experimenters as input and outputs a structured representation that can be used for self-driving labs. The framework consists of a three-stage workflow. First, a domain-specific program is synthesized from the natural language description using a classic action/entity extraction and Expectation Maximization approach. Then, reagent flow analysis, essentially reaching definitions of the synthesized program is performed. Third, constraints prohibiting undesired execution behaviors are inferred and will be used to monitor the execution. Results show that the synthesized protocol translation matches manually written ones by human experimenters.

优点

  • The problem is well-motivated and of great significance in advancing AI applications in scientific discovery.
  • The paper is easy to follow, although the required background knowledge is non-trivial.
  • The empirical evaluation shows that the proposed approach outperforms pure LLM-based synthesis and matches the manual translation by human experimenters.

缺点

  • The proposed solution is a portfolio of standard applications of existing tools or well-known algorithms, which is less interesting and novel from a machine learning perspective.
  • The targeted DSL is relatively simple, and the proposed solution consists of ad-hoc design choices (particularly spatial-temporal dynamics), which may not generalize well to DSLs with richer features

问题

What off-the-shelf tools are used in the pre-processing to extract actions and entities? Are they LLMs? Are there important differences between the extraction of action/entities and the extraction of reagent entities (state-of-the-art LLMs are used in the latter)?

局限性

The authors briefly discussed limitations in Appendix E.

评论

Below, we present several real-world examples to illustrate these distinctions. In our implementation of the pipeline, we employ a state-of-the-art dependency parser [1] alongside a state-of-the-art LLM-based NER model [2].

original textaction extractionentity extractionclassification with LLMpreprocess resultLLM-pure
Stain with DAPI nucleic acid stain for 30 seconds.stain['DAPI nucleic acid stain', '30 seconds'][(property='reagent', value='DAPI nucleic acid stain'), (property='time', value='30 seconds')]{"action": "stain", "output": "", "reagent": ["DAPI nucleic acid stain"], "time": ["30 seconds"]};{"action": "stain", "duration": ["30 seconds"], "reagent": ["DAPI nucleic acid stain"]};
Purify CD4+ by magnetic isolation using the Auto MACS sorter (Miltenyi Biotec) using POSSELD2 program.purify['the Auto MACS sorter (Miltenyi Biotec)', 'POSSELD2 program', 'CD4+'][(property='reagent', value='CD4+'), (property='device', value='the Auto MACS sorter (Miltenyi Biotec)'), (property='device', value='POSSELD2 program')]{"action": "purify", "device": ["the Auto MACS sorter (Miltenyi Biotec)", "POSSELD2 program"], "output": "", "reagent": ["CD4+"]};{"action": "purify", "device": ["the Auto MACS sorter (Miltenyi Biotec)", "POSSELD2"], "method": ["magnetic isolation"], "reagent": ["CD4+"]};
Measure baseline oxidative status every 20 s for at least 5 min, then add stimulating substances (e.g., thapsigargin).measure, add['baseline oxidative status', 'every 20 s', '5 min'], ['stimulating substances'][Parameter(property='output', value='baseline oxidative status'), Parameter(property='time', value='every 20 s'), Parameter(property='time', value='5 min')], [(property='reagent', value='stimulating substances')]{"action": "measure", "output": "baseline oxidative status", "time": ["every 20 s", "5 min"]}; {"action": "add", "reagent": ["stimulating substances"]};{"action": "measure", "output": "baseline oxidative status", "reagent": ["stimulating substances"], "time": ["every 20 s", "5 min"]};
Spin the crude extracts by ultracentrifugation at 55000 RPM to properly pellet residual insoluble proteins from the extract.spin['ultracentrifugation', '55000 RPM', 'residual insoluble proteins', 'the extract'][Parameter(property='device', value='ultracentrifugation'), Parameter(property='force', value='55000 RPM'), Parameter(property='reagent', value='residual insoluble proteins'), Parameter(property='container', value='the extract')]{"action": "spin", "device": ["ultracentrifugation"], "force": ["55000 RPM"], "output": "", "reagent": ["residual insoluble proteins"], "time": [""]}{"action": "spin", "reagent": ["crude extracts"], "method": ["ultracentrifugation"], "purpose": ["to properly pellet residual insoluble proteins from the extract"], "speed": ["55000 RPM"]}
Confirm positive colonies by transient transfection of sgRNAs analysis (SPH primers).confirm['positive colonies', 'sgRNAs analysis (SPH primers)'][Parameter(property='output', value='positive colonies'), Parameter(property='reagent', value='sgRNAs analysis (SPH primers)')]{"action": "confirm", "output": "positive colonies", "reagent": ["sgRNAs analysis (SPH primers)"]}{"action": "confirm", "device": ["SPH primers"], "method": ["transient transfection of sgRNAs analysis"], "output": ["positive colonies"]}

References:

[1] Honnibal, M. and Johnson, M. (2015). An improved non-monotonic transition system for dependency parsing. In Annual Conference on Empirical Methods in Natural Language Processing.

[2] Xie, T., Li, Q., Zhang, Y., Liu, Z., and Wang, H. (2024). Self-improving for zero-shot named entity recognition with large language models. arXiv preprint arXiv:2311.08921.

评论

Here, we provide a realistic example of XDL machine-executable code used to operate the devices in a self-driving laboratory for Chemistry [1].

Original protocol: 

2,2'-dinitro-6,6'-dimethylbiphenyl (39 g, 0.14 mol) was dissolved in 100 ml ethyl acetate in a hydrogenation vessel. Palladium on Carbon (10%, 5.5 g) was added. The system was evacuated and H2 added to a pressure of 28 psi. The reaction was left until no further uptake of H2 could be detected. The solution was filtered through celite and the solvent evaporated to give the product diamine in 100% yield.


Machine executable code:

<?xdl version="1.0.0" ?>
<XDL>

<Synthesis>

  <Hardware>
    <Component
      id="cartridge_celite"
      type="cartridge"
      chemical="celite" />
    <Component
      id="reactor"
      type="reactor" />
    <Component
      id="rotavap"
      type="rotavap" />
  </Hardware>

  <Reagents>
    <Reagent
      name="2,2'-dinitro-6,6'-dimethylbiphenyl"
      id="2,2'-dinitro-6,6'-dimethylbiphenyl"
      role="reagent" />
    <Reagent
      name="H2"
      id="H2"
      role="reagent" />
    <Reagent
      name="ethyl acetate"
      id="ethyl acetate"
      role="reagent" />
    <Reagent
      name="palladium on Carbon (10 %)"
      id="palladium on Carbon (10 %)"
      role="reagent" />
  </Reagents>

  <Procedure>
    <AddSolid
      vessel="reactor"
      reagent="2,2'-dinitro-6,6'-dimethylbiphenyl"
      mass="39 g" />
    <Dissolve
      vessel="reactor"
      solvent="ethyl acetate"
      volume="100 mL"
      temp="25 °C" />
    <AddSolid
      vessel="reactor"
      reagent="palladium on Carbon (10 %)"
      mass="5.5 g"
      stir="True" />
    <EvacuateAndRefill
      vessel="reactor" />
    <Add
      vessel="reactor"
      reagent="H2"
      volume="0"
      stir="True"
      speed="40.0" />
    <FilterThrough
      from_vessel="reactor"
      to_vessel="rotavap"
      through="celite" />
    <Evaporate
      vessel="rotavap"
      time="30 min" />
  </Procedure>

</Synthesis>

</XDL>

References:

[1] S. Hessam M. Mehr et al., A universal system for digitization and automatic execution of the chemical synthesis literature. Science 370, 101-108 (2020).

作者回复

The proposed solution is a portfolio of standard applications of existing tools or well-known algorithms, which is less interesting and novel from a machine learning perspective.

Thanks for the comment. In this work, we study the problem of translating experimental protocols designed for human experimenters into formats suitable for machine execution. Our primary motivation is to bridge the existing gap between machine learning algorithms in the field of AI for science, such as molecular design, and the grounded experimental verification facilitated by self-driving laboratories. We appreciate the reviewer's recognition that "the required background knowledge is non-trivial". Indeed, conventional workflows for setting up self-driving laboratories and conducting physical experiments necessitate deep integration with domain experts, significantly impeding the progress of machine learning researchers in verifying and iterating their findings. Consequently, our framework aims to provide an infrastructure that enables these researchers to advance their machine learning algorithms and seamlessly validate their findings, thereby closing the loop of automatic scientific discovery.

To meet the requirements of such infrastructure, we conduct a systematic study to identify existing gaps in protocol translation between human experimenters and automatic translators in self-driving lab. Accordingly, we develop the three-stage framework that integrates cognitive insights from human experts with approaches from program synthesis, automata construction, and counterfactual analysis. At the syntax level, we synthesize the operation dependence graph to transform natural-language-based protocols into structured representations, thereby making explicit the operation-condition mappings and the control flows. At the semantics level, we analyze the reagent flow graph to reconstruct the complete lifecycles of intermediate products, addressing the latent, missing, or omitted properties and values. At the execution level, we contextualize both the operation dependence graph and the reagent flow graph within spatial and temporal dynamics, resulting in the protocol dependence graph. This graph conducts counterfactual reasoning to detect potential conflicts or shortages of execution resources and to identify inappropriate combinations of operations in execution sequences.

The targeted DSL is relatively simple, and the proposed solution consists of ad-hoc design choices (particularly spatial-temporal dynamics), which may not generalize well to DSLs with richer features

Thanks for the comment. In this study, we aim to investigate the development of automatic translators for executing experimental protocols in self-driving labs. To execute the instructions on Internet-of-Things-connected hardware devices, such as valves, pumps, and reactors, protocols must ultimately be formatted in JSON-style configuration files, which represent a mainstream format in hardware-software communication (see C.3-1).

Although DSLs with syntactic and semantic features differing from those of JSON-style DSLs for self-driving labs are beyond the scope of this paper, generalizing the framework to DSLs with other language features represents a significant direction for future research. We appreciate the reviewer's suggestion in this regard.

We acknowledge that the core components of our framework are designed in an ad-hoc manner. The design guidelines are derived from both a systematic study of the required cognitive capabilities in protocol translation and established computer science theories. Rather than directly devising engineering solutions tailored to the specific problem, we decompose the problem to abstract subproblems, including symbolic regression, flow analysis, and counterfactual reasoning. We consider these subproblems as scientific challenges and develop solutions from the conceptual level to the implementation level in a top-down approach.

Specifically, the scientific challenge behind the design of spatial-temporal dynamics lies in the extremely long-tail distribution of historical run-time error cases. To address this issue, we propose leveraging foresight simulation by contextualizing individual operations within both the spatial dimension, i.e., the specific assignment of resources according to capacity requirements, and the temporal dimension, i.e., the specific precondition and postcondition of resources according to their properties. The division of spatial and temporal dimensions is mutually exclusive and echoes the two major aspects of computer programs --- computation resources and control logic. Thus, although ad-hoc, the design of spatial-temporal dynamics targets the underlying scientific problem behind the superficial challenges and holds the potential for generalization to other DSLs. Extending the application scope of our framework is a significant direction for future work.

What off-the-shelf tools are used in the pre-processing to extract actions and entities? Are they LLMs? Are there important differences between the extraction of action/entities and the extraction of reagent entities?

Thanks for the question. We employ the SpaCy Dependency Parser to analyze the syntactic structure of protocols, which allows for the extraction of verbs and the identification of associated objects and modifiers. After parsing, these verbs are aligned with corresponding operational actions in our DSL by maximizing the cosine similarity between their word2vec representations and those of the DSL operations. Furthermore, we utilize a few-shot model based on LLM to accurately identify and classify entities within the text. The rationale of integrating LLMs with classical parsing techniques lies in leveraging the advanced natural language processing capabilities of LLMs while mitigating their inherent uncertainties.

There are significant differences exist between various stages of this pipeline (see C.3-2).

审稿意见
7

The paper presents a framework for translating experimental protocols from natural language (NL) to machine-interpretable formats, specifically designed for self-driving laboratories. The proposed framework automates the protocol translation process through a three-stage workflow that constructs Protocol Dependence Graphs (PDGs) incrementally at the syntax, semantics, and execution levels. The approach is validated through quantitative and qualitative evaluations, demonstrating its performance on par with human experts.

优点

  • The paper introduces a novel, automated approach to protocol translation for self-driving laboratories, addressing a critical gap in the transition from AI-driven discoveries to empirical experimentation.

  • The paper is well-structured and clearly written.

缺点

  • The proposed method requires substantial computational resources for training and execution, which might limit its accessibility for some research teams.
  • The paper could benefit from a more detailed analysis of the types of errors made by the automated translator compared to human experts, which would help understand the limitations and areas for improvement.

问题

  • Can the authors provide more details on how the system handles ambiguous or incomplete protocol instructions that may be common in real-world scenarios?
  • What specific optimizations could be applied to reduce the computational requirements of the proposed framework?
  • How does the system ensure the safety and correctness of translated protocols, especially in high-stakes domains such as medical and clinical research?

局限性

The authors have adequately addressed the limitations.

评论

This series of examples demonstrates how our system tracks the required capacities at each step of the protocol by contextualizing the step into the spatial dimension.

original textexecution levelkey resources
Add 4 μl of 160 mM KMnO4 to radiolabeled DNA (40 ng, 5,000-10,000 cpm) in 40 μl total volume.{"action": "add", "output": "reaction mixture", "reagent": ["160 mM KMnO4", "radiolabeled DNA"], "volume": ["4 μl", "40 μl"]}"radiolabeled DNA"
Precipitate with ethanol.{"action": "precipitate", "output": "precipitate", "reagent": ["ethanol"]}"reaction mixture"
dissolve in 70 μl 10% piperidine,{"action": "dissolve", "output": "dissolved DNA", "reagent": ["10% piperidine"], "volume": ["70 μl"]}"precipitate"
incubate at 90 °C for 30 min{"action": "incubate", "output": "incubated DNA", "temperature": ["90 °C"], "time": ["30 min"]}"dissolved DNA"
Precipitate with ethanol{"action": "precipitate", "output": "pellets", "reagent": ["ethanol"]}"incubated DNA"
Wash pellets with 70% ethanol, dry, dissolve in 5 μl electrophoresis loading buffer.{"action": "rinse", "output": "non-labeled DNA", "reagent": ["70% ethanol", "electrophoresis loading buffer"], "volume": ["5 μl"]}"pellets"

This series of examples illustrates how our system tracks the preconditions and postconditions at each step of the protocol by contextualizing the step into the temporal dimension.

original textexecution levelkey resources
Freeze cells for 1 hour at -80°C, thaw at 37°C for 1 hour.{"action": "freeze", "output": "DLF_R004", "reagent": ["cells"], "time": ["1 hour"], "temperature": ["-80°C"]}, {"action": "thaw", "output": "DLF_R004", "reagent": ["cells"], "time": ["1 hour"], "temperature": ["37°C"]}"DLF_R004"
If not using DLF_R004, lyse cells with lysis buffer.{"action": "lyse", "output": "cell lysate", "reagent": ["lysis buffer"], "condition": ["not using DLF_R004"]}"DLF_R004"
Prepare serological pipette by cutting at the 3 mL mark, sealing bottom with parafilm.{"action": "prepare", "output": "modified pipette", "device": ["serological pipette"], "modification": ["cutting at the 3 mL mark", "sealing bottom with parafilm"]}"lysis buffer"
Secure serological pipette to a vertical surface.{"action": "secure", "output": "secured pipette", "device": ["serological pipette"]}"modified pipette"
Fill pipette with at least 2.5 mL cell lysate, measure distance from 2 mL to 1 mL mark.{"action": "fill", "output": "filled pipette", "volume": ["at least 2.5 mL"], "reagent": ["cell lysate"]}"secured pipette"
Position cell phone camera to record pipette, drop a glass bead inside, repeat two more times.{"action": "position", "output": "recorded experiment", "device": ["cell phone camera", "pipette"], "reagent": ["glass bead"]}"filled pipette"
remove parafilm seal.{"action": "remove", "output": "", "container": ["parafilm seal"]}"recorded experiment"
rinse pipette{"action": "rinse", "output": "cleaned pipette", "device": ["pipette"]}"next sample"
repeat with next sample to obtain triplicates{"action": "repeat", "output": "triplicates"}"triplicates"
评论

The example presents as follows — the completion of two types of parameters at the semantic level is included: for instance, determining the configuration parameter for an operation, where human experts rely on personal experimental experience; and inferring the required reagents for one step, where human experts use contextual reasoning. When the context is not sufficiently clear, human experts cannot infer the known unknowns within a single sentence.

original textsemantic level - machine resultremarks
Add 700 μl of buffer RWT to the RNeasy MinElute spin column.{"action": "add", "output": "", "reagent": ["<<<buffer RWT>>>"], "volume": ["700 µl"]}known unknown
Discard the flow-through.{"action": "discard", "output": "the flow-through", "volume": [""]}
Discard the collection tube with the flow-through.{"action": "discard", "output": "the flow-through", "container": ["the collection tube"], "volume": [""], "reagent": ["the flow-through"]}
Transfer the RNeasy MinElute spin column into a new 2 ml collection tube (supplied).{"action": "transfer", "output": "", "device": ["RNeasy MinElute"], "container": ["a new 2 ml collection tube (supplied)"], "volume": [""]}
Open the lid of the spin column.{"action": "open", "output": "", "container": ["<<<spin column>>>"]}known unknown
Centrifuge at full speed (14,000 xg) to dry the membrane.{"action": "centrifuge", "output": "", "speed": ["full speed (14,000 xg)"], "container": ["membrane"], "time": ["<<<5 min>>>"]}unknown unknown
Discard the collection tube with the flow-through.{"action": "discard", "output": "the flow-through", "container": ["the collection tube"], "volume": [""], "reagent": ["the flow-through"]}
Transfer the RNeasy MinElute spin column into a new 1.5 ml collection tube.{"action": "transfer", "output": "RNase-free water", "device": ["RNeasy MinElute"], "container": ["a new 1.5 ml collection tube (supplied)"], "volume": [""]}
Add 14 μl RNase-free water directly to the center of the spin column membrane.{"action": "add", "output": "", "reagent": ["<<<RNase-free water>>>"], "volume": ["14 µl"]}known unknown
original textsemantic level - human resultremarks
Add 700 μl of buffer RWT to the RNeasy MinElute spin column.{"action": "add", "output": "", "reagent": ["<<<NONE>>>"], "volume": ["700 µl"]}known unknown
Discard the flow-through.{"action": "discard", "output": "the flow-through", "volume": [""]}
Discard the collection tube with the flow-through.{"action": "discard", "output": "the flow-through", "container": ["the collection tube"], "volume": [""], "reagent": ["the flow-through"]}
Transfer the RNeasy MinElute spin column into a new 2 ml collection tube (supplied).{"action": "transfer", "output": "", "device": ["RNeasy MinElute"], "container": ["a new 2 ml collection tube (supplied)"], "volume": [""]}
Open the lid of the spin column.{"action": "open", "output": "", "container": ["<<<spin column>>>"]}known unknown
Centrifuge at full speed (14,000 xg) to dry the membrane.{"action": "centrifuge", "output": "", "speed": ["full speed (14,000 xg)"], "container": ["membrane"], "time": ["<<<5 min>>>"]}unknown unknown
Discard the collection tube with the flow-through.{"action": "discard", "output": "the flow-through", "container": ["the collection tube"], "volume": [""], "reagent": ["the flow-through"]}
Transfer the RNeasy MinElute spin column into a new 1.5 ml collection tube.{"action": "transfer", "output": "RNase-free water", "device": ["RNeasy MinElute"], "container": ["a new 1.5 ml collection tube (supplied)"], "volume": [""]}
Add 14 μl RNase-free water directly to the center of the spin column membrane.{"action": "add", "output": "", "reagent": ["<<<water>>>"], "volume": ["14 µl"]}known unknown
评论

This series of examples demonstrates the superior performance of our system at the syntax level when processing relatively short sentences.

original textsyntax level
Transfer 200 µl of serum into a 1.5 ml microfuge tube.{"action": "transfer", "output": "buffer", "reagent": ["serum"], "container": ["a 1.5 ml microfuge tube"], "volume": ["200 µl"]}
Add 200 µl of buffer XBP.{"action": "add", "output": "the sample/XBP mixture", "reagent": ["buffer"], "volume": ["200 µl"]}
Add the sample/XBP mixture onto the exoEasy spin column.{"action": "add", "output": "", "reagent": ["the sample/XBP mixture"]}
Add 10 ml of buffer XWP to the exoEasy spin column.{"action": "add", "output": "", "reagent": ["buffer XWP"], "volume": ["10 ml"]}
Transfer the spin column to a fresh collection tube.{"action": "transfer", "output": "Qiazol", "container": ["a fresh collection tube"]}
Add 700 μl Qiazol to the membrane of the spin column.{"action": "add", "output": "", "reagent": ["Qiazol"], "volume": ["700 μl"]}
Transfer the flow-though, which is the lysate, to a PLG tube.{"action": "transfer", "output": "flow-though", "container": ["PLG tube"]}
Incubate at room temperature for 5 minutes.{"action": "incubate", "output": "", "temperature": ["room temperature"], "time": ["5 minutes"]}
Add 90 µl chloroform.{"action": "add", "output": "", "volume": ["90 µl"]}

This series of examples illustrates the challenges faced with longer sentences due to the diversity of actions and the multiple parameters.

original textsyntax levelremarks
Rinse MSC with 2 ml PBS.{"action": "rinse", "output": "MSC with 2 ml PBS", "reagent": ["PBS"], "volume": ["2 ml"]}
trypsinize with 0.5 ml trypsin,{"action": "trypsinize", "output": "", "reagent": ["trypsin"], "volume": ["0.5 ml"]}
transfer to 15 ml tube,{"action": "transfer", "output": "", "container": ["15 ml tube"]}
add 10 ml DMEM with 10% FBS{"action": "add", "output": "", "reagent": ["DMEM", "FBS"], "volume": ["10 ml"]}
Incubate overnight at 37 °C, 5% CO2.{"action": "incubate", "output": "", "temperature": ["37 °C"]}Lack of parameter
Wash cells twice with 2 ml PBS, add osteogenic differentiation medium.{"action": "wash", "output": "", "volume": ["2 ml PBS"], "reagent": ["osteogenic differentiation medium"]}Lack of action in single sentence
change medium every 2 days for 10 days{"action": "change", "output": "Alizarin red S", "time": ["every 2 days for 10 days"]}
At day 10, stain with Alizarin red S for 5 min.{"action": "stain", "output": "", "reagent": ["Alizarin red S"], "time": ["5 min"]}
评论

Tab. 2-1-5: Execution level - Capacity of resources

original textexecution levelreagent flow graph
Prepare annealing solution of 50 µM RNA/DNA oligos with 50 mM NaCl in DNase/RNase-free water, aliquot 50 µl in PCR tube.{"action": "prepare", "output": "annealing solution", "concentration": ["50 µM RNA/DNA oligos", "50 mM NaCl"], "reagent": ["DNase/RNase-free water"], "volume": ["50 µl"], "container": ["PCR tube"]}in: DNase/RNase-free water (50 µl), RNA/DNA oligos (50 µM), NaCl (50 mM); out: annealing solution (50 µl)
Dissolve inhibitor compound in DMSO to 10 mM, if needed, prepare serial dilutions in Milli-Q water.{"action": "dissolve", "output": "inhibitor compound solution", "reagent": ["inhibitor compound", "DMSO"]}in: inhibitor compound, DMSO; out: inhibitor compound solution (volume depends on dilution)
Add water (20 µl in blanks, 10 µl in controls) to 96-well plate.{"action": "add", "output": "water in wells", "reagent": ["water"], "container": ["96-well plate"]}in: water (20 µl for blanks, 10 µl for controls); out: water in 96-well plate (20 µl in blanks, 10 µl in controls)
Add 80 µl RT reaction mix (1.25x).{"action": "add", "output": "RT reaction mix in wells", "volume": ["80 µl"]}in: RT reaction mix (80 µl); out: RT reaction mix in 96-well plate (80 µl)
Add 10 µl inhibitor dilution to samples, to each well.{"action": "add", "output": "samples with inhibitor", "volume": ["10 µl"], "reagent": ["inhibitor dilution"]}in: inhibitor dilution (10 µl); out: samples with inhibitor (10 µl)
Stop reaction with 50 µl EDTA (0.5 M, pH 8.0).{"action": "stop", "output": "stopped reaction", "reagent": ["EDTA"], "volume": ["50 µl"]}in: EDTA (50 µl); out: stopped reaction with EDTA (50 µl)
Quantify reaction with Victor 3 at 490/528 nm, report inhibitor values as percentage of control.{"action": "quantify", "output": "quantified reaction", "device": ["Victor 3"]}in: reaction; out: quantified reaction at 490/528 nm
Subtract blank value from samples.{"action": "subtract", "output": "corrected samples", "reagent": ["blank value"]}in: blank value, samples; out: corrected sample values
Calculate IC50 value as the concentration reporting 50% reduction of signal compared to control.{"action": "calculate", "output": "IC50 value", "reagent": ["signal"]}in: signal; out: IC50 value

Tab. 2-1-6: Execution level - Safety of operations

original textexecution levelreagent flow graph
Replace medium after 12 hours (Day 2).{"action": "replace", "output": "medium replaced", "container": ["medium"], "volume": [""]}in: old medium; out: new medium
Digest mESCs with 0.05% trypsin, prepare for FACS into 96-well plates (Day 10).{"action": "digest", "output": "mESCs", "reagent": ["0.05% trypsin"], "container": ["96-well plates"]}in: mESCs, 0.05% trypsin; out: digested mESCs (ensure trypsin is neutralized to avoid over-digestion)
Remove single colonies from 96-well plates to 24-well plates.{"action": "remove", "output": "single colonies", "container": ["96-well plates", "24-well plates"]}in: single colonies; out: single colonies in 24-well plates
Confirm positive colonies by transient transfection of sgRNAs analysis (SPH primers) (Day 14-15).{"action": "confirm", "output": "positive colonies", "reagent": ["SPH primers"]}in: single colonies, SPH primers; out: positive colonies
Replace medium after 12 hours (Day 2).{"action": "replace", "output": "medium replaced", "container": ["medium"], "volume": [""]}in: old medium; out: new medium
Sort single cells into 96-well plates by FACS.{"action": "sort", "output": "single cells", "device": ["FACS"], "container": ["96-well plates"]}in: single cells; out: sorted single cells in 96-well plates (ensure proper calibration of FACS to avoid sorting errors)
Confirm insertion by PCR (Day 18).{"action": "confirm", "output": "insertion confirmed"}in: single cells; out: confirmed insertion
Remove single colonies from 96-well plates to 24-well plates.{"action": "remove", "output": "single colonies", "container": ["24-well plates"]}in: single colonies; out: single colonies in 24-well plates
Confirm positive colonies by PCR (Day 22).{"action": "confirm", "output": "positive colonies"}in: single colonies; out: positive colonies
Measure fluorescent intensity of colonies by FACS, take fluorescence images under confocal microscope (Day 27).{"action": "take", "output": "fluorescence images", "device": ["confocal microscope"], "container": ["colonies"]}in: colonies; out: fluorescence images (handle samples to avoid photobleaching)
评论

Tab. 2-1-4: Semantic level - Latent semantics of unknown unknowns

original textsemantic levelunknown unknowns
Harvest approximately 1×10<sup>7</sup> cells by centrifugation for 5 min.{"action": "harvest", "output": "", "device": ["centrifugation"], "force": ["<<<2000 RPM>>>"], "time": ["5 min"]}"<<<2000 RPM>>>"
Cell lysates are homogenized by passing through 22-gauge needles.{"action": "homogenize", "output": "", "reagent": ["cell lysates"]}
Tubes are put on ice for 15 min to complete the lysis.{"action": "incubate", "output": "", "container": ["tubes"], "time": ["15 min"], "temperature": ["on ice"]}
Crude extracts are then centrifuged.{"action": "centrifuge", "output": "", "force": ["<<<2500 RPM>>>"], "time": ["<<<5 min>>>"]}"<<<5 min>>>"
Supernatants are transferred to fresh centrifuge tubes.{"action": "transfer", "output": "", "container": ["fresh centrifuge tubes"]}
Cold 5 M NaCl is added to each sample to make a salt concentration of between 0.7 – 1.0 M to disrupt protein-protein interactions.{"action": "add", "output": "sample with NaCl", "container": ["each sample"], "reagent": ["5 M NaCl"], "concentration": ["0.7 – 1.0 M"]}
Spin the crude extracts by ultracentrifugation to properly pellet residual insoluble proteins from the extract.{"action": "spin", "output": "Hypotonic Buffer", "device": ["ultracentrifugation"], "force": ["<<<55000 RPM>>>"], "reagent": ["residual insoluble proteins"]}"<<<55000 RPM>>>"
Transfer supernatants into fresh centrifuge tubes.{"action": "transfer", "output": "", "reagent": ["supernatants"], "container": ["fresh centrifuge tubes"]}
Rinse Protein A beads in Hypotonic Buffer until ready for use.{"action": "rinse", "output": "use", "reagent": ["Hypotonic Buffer"]}
Take a volume of cell lysates (prepared as described above).{"action": "take", "output": "Hypotonic Buffer", "volume": ["cell lysates"]}
Dilute with Hypotonic Buffer to 250 – 500 mM salt to enable protein-protein interactions.{"action": "dilute", "output": "antibody", "reagent": ["Hypotonic Buffer"]}
Add 2 µg of preclearing antibody to the diluted lysate (e.g., anti), vortex, add 50 µL of Protein A beads.{"action": "add", "output": "polyclonal anti-MEKK1", "reagent": ["antibody", "Protein A beads"]}
Add 2 µg of polyclonal anti-MEKK1 to the lysates, add 50 µL of Protein A beads at 4 °C for 1 h.{"action": "add", "output": "", "reagent": ["polyclonal anti-MEKK1"], "container": ["the lysates"], "temperature": ["4 °C"], "time": ["1 h"]}
Touchspin beads, wash beads with hypotonic buffer (supplemented with NaCl).{"action": "wash", "output": "", "reagent": ["hypotonic buffer"], "concentration": ["<<<300 mM>>>"]}"<<<300 mM>>>"
In total, 3 – 5 washes of the beads are performed.{"action": "perform", "output": "", "reagent": ["hypotonic buffer"], "frequency": ["3 – 5"]}
Finally, wash once with Hypotonic Buffer.{"action": "wash", "output": "", "reagent": ["Hypotonic Buffer"]}
Purified MEKK1 may be stored by snap-freezing in liquid nitrogen.{"action": "store", "output": "M", "method": ["snap-freezing"], "reagent": ["liquid nitrogen"]}
Following preparation of MEKK1 immunoprecipitates (as above), incubate with 7 µg of JNKK1(K131M) along with 5 µCi of [γ-<sup>32</sup>P]ATP for 30 min.{"action": "incubate", "output": "", "reagent": ["JNKK1(K131M)", "[γ-<sup>32</sup>P]ATP"], "container": ["<<<Kinase Assay Buffer>>>"], "temperature": ["<<<30 °C>>>"], "time": ["30 min"]}"<<<Kinase Assay Buffer>>>", "<<<30 °C>>>"
评论

Tab. 2-1-2: Syntax level - Operation control flows

original textsyntax levelactioncontrol flows
Centrifuge the cell suspension at 200 x g at room temperature for 5 min.{"action": "centrifuge", "output": "cell pellet", "temperature": ["room temperature"], "time": ["5 min"], "force": ["200 x g"]}CentrifugeLinear
Remove the supernatant.{"action": "remove", "output": "supernatant"}RemoveLinear
Suspend the cell pellet with 2 ml ACK lysing buffer for 1 min to deplete red blood cells.{"action": "suspend", "output": "depleted cell suspension", "volume": ["2 ml"], "reagent": ["ACK lysing buffer"], "time": ["1 min"]}SuspendLinear
If red blood cells are not completely depleted, repeat the ACK lysing buffer step until they are.{"action": "repeat", "output": "", "reagent": ["ACK lysing buffer"], "condition": ["if red blood cells are not completely depleted"]}RepeatNon-linear
Filter the cell suspension through a 40 μm nylon strainer.{"action": "filter", "output": "filtered cell suspension", "container": ["40 μm nylon strainer"]}FilterLinear
Wash the strainer with 2 ml 1x DPBS for 5 min.{"action": "wash", "output": "", "container": ["strainer"], "reagent": ["1x DPBS"], "volume": ["2 ml"], "time": ["5 min"]}WashLinear
Wash the cell pellet with 1x DPBS with 20 ng/ml murine M-CSF in a 100 mm Petri dish.{"action": "wash", "output": "", "reagent": ["1x DPBS with 20 ng/ml murine M-CSF"], "container": ["100 mm Petri dish"]}WashLinear
Suspend in 15 ml complete DMEM medium.{"action": "suspend", "output": "cell suspension in DMEM", "volume": ["15 ml"], "reagent": ["complete DMEM medium"]}SuspendLinear
Incubate at 37 °C, 5% CO2.{"action": "incubate", "output": "incubated cells", "temperature": ["37 °C"], "environment": ["5% CO2"]}IncubateLinear
After 3 days, replace half of the medium with fresh complete DMEM medium.{"action": "replace", "output": "", "reagent": ["fresh complete DMEM medium"], "time": ["after 3 days"]}ReplaceLinear
Repeat this step every 2 days.{"action": "repeat", "output": "", "reagent": ["fresh complete DMEM medium"], "frequency": ["every 2 days"]}RepeatNon-linear

Tab. 2-1-3: Semantic level - Latent semantics of known unknowns

original textsemantic levelknown unknowns
Transfer the sample (plasma, , cell suspension) into a glass centrifuge vial.{"action": "transfer", "output": "", "reagent": ["the sample (plasma, , cell suspension)"], "container": ["a glass centrifuge vial"]}
Adjust the volume to 1 ml with PBS.{"action": "modify", "output": "heparinized blood.1 ml medium", "volume": ["<<<1 ml>>>"], "length": [""], "device": [""]}"1 ml"
50-200 µl plasma was taken from heparinized blood.1 ml medium.{"action": "take", "output": "", "reagent": ["heparinized blood.1 ml medium"]}
Plasma was directly taken from cell culture.{"action": "take", "output": "a plasma sample", "reagent": [""]}
Add 10 µl of the internal standard (10 μM C17-S1P in MeOH). Add 300 µl of 18.5% HCl.{"action": "add", "output": "", "reagent": ["18.5% HCl"], "volume": ["<<<300 µl>>>"]}"300 µl"
As an example, S1P extraction from a plasma sample is shown in step A7.{"action": "show", "output": "step A7", "reagent": ["a plasma sample"]}
The CHCl3-phase is extracted by directly pipetting through the upper aqueous phase.{"action": "extract", "output": "the CHCl3", "container": ["the upper aqueous phase"], "reagent": ["step A7"]}
Add this CHCl3-phase to the transferred CHCl3-phase of step A7.{"action": "add", "output": "", "reagent": ["this CHCl3-phase"]}
Vacuum-dry the CHCl3 in the vacuum rotator at 60 °C for 45 min.{"action": "rinse", "output": "", "reagent": ["<<<the CHCl3>>>"], "temperature": ["60 °C"], "time": ["<<<45 min>>>"]}"the CHCl3", "45 min"
Alternatively, the samples can be dried under nitrogen gas flow.{"action": "dry", "output": "", "reagent": ["samples"], "time": ["1-20 min"]}
Re-equilibrate with 90% solution A.{"action": "equilibrate", "output": "S1P", "concentration": ["90% solution"], "volume": [""]}
S1P is analyzed with the mass transition 380 m/z -> 264 m/z. For quantitative analysis, a standard curve with S1P amounts of 1 pmol to 100 pmol as the internal standard is generated.{"action": "examine", "output": "quantitative analysis", "reagent": ["S1P"]}
评论

Tab. 2-1-1: Syntax level - Operation-condition mapping

original textsyntax levelactionconditions
Spin media at 500-1,000 x g for 10 min (optional), pre-x g for 10 min, filter with 0.22 µm PES membrane, freeze at -80°C.{"action": "spin", "output": "filtered media", "speed": ["500-1,000 x g"], "time": ["10 min"]}, {"action": "filter", "output": "filtered media", "device": ["0.22 µm PES membrane"]}, {"action": "freeze", "output": "frozen media", "temperature": ["-80°C"]}Spin, Filter, FreezeSpeed: 500-1,000 x g, Time: 10 min, Device: 0.22 µm PES membrane, Temperature: -80°C
Thaw 4 ml supernatant on ice, add 4 ml XBP buffer.{"action": "thaw", "output": "thawed supernatant", "volume": ["4 ml"], "reagent": ["supernatant"]}, {"action": "add", "output": "sample/XBP mix", "reagent": ["XBP buffer"], "volume": ["4 ml"]}Thaw, AddVolume: 4 ml, Temperature: On ice
Add sample/XBP mix to exoEasy maxi spin column, centrifuge 1-3 min at 500 x g, discard flow-through.{"action": "add", "output": "flow-through", "reagent": ["sample/XBP mix"], "container": ["spin column"]}, {"action": "centrifuge", "output": "flow-through", "speed": ["500 x g"], "time": ["1-3 min"]}, {"action": "discard", "output": "", "reagent": ["flow-through"]}Add, Centrifuge, DiscardContainer: Spin column, Speed: 500 x g, Time: 1-3 min
Add 10 ml XWP to spin column, centrifuge 5 min at 5,000 x g, transfer column to fresh collection tube.{"action": "add", "output": "", "reagent": ["XWP"], "volume": ["10 ml"]}, {"action": "centrifuge", "output": "", "speed": ["5,000 x g"], "time": ["5 min"], "container": ["spin column"]}, {"action": "transfer", "output": "", "container": ["fresh collection tube"]}Add, Centrifuge, TransferVolume: 10 ml, Speed: 5,000 x g, Time: 5 min, Container: Spin column, Fresh collection tube
Add 700 µL Qiazol to spin column, centrifuge 5 min at 5,000 x g, spin PLG tubes 30 s at 16,000 x g.{"action": "add", "output": "", "reagent": ["Qiazol"], "volume": ["700 µL"]}, {"action": "centrifuge", "output": "", "speed": ["5,000 x g"], "time": ["5 min"], "container": ["spin column"]}, {"action": "spin", "output": "", "speed": ["16,000 x g"], "time": ["30 s"], "container": ["PLG tubes"]}Add, Centrifuge, SpinVolume: 700 µL, Speed: 5,000 x g, Time: 5 min, Speed: 16,000 x g, Time: 30 s, Container: Spin column, PLG tubes
Add flow-through to PLG tube, vortex 5 s, incubate 5 min at RT.{"action": "add", "output": "", "reagent": ["flow-through"], "container": ["PLG tube"]}, {"action": "vortex", "output": "", "time": ["5 s"]}, {"action": "incubate", "output": "", "time": ["5 min"], "temperature": ["RT"]}Add, Vortex, IncubateContainer: PLG tube, Time: 5 s, Time: 5 min, Temperature: RT
Add 90 µL chloroform.{"action": "add", "output": "", "volume": ["90 µL"], "reagent": ["chloroform"]}AddVolume: 90 µL
Shake vigorously for 15 s, incubate 2-3 min at RT.{"action": "shake", "output": "", "time": ["15 s"]}, {"action": "incubate", "output": "", "time": ["2-3 min"], "temperature": ["RT"]}Shake, IncubateTime: 15 s, Time: 2-3 min, Temperature: RT
Centrifuge 15 min at 12,000 x g, transfer upper aqueous phase to new tube.{"action": "centrifuge", "output": "upper aqueous phase", "speed": ["12,000 x g"], "time": ["15 min"]}, {"action": "transfer", "output": "upper aqueous phase", "container": ["new tube"]}Centrifuge, TransferSpeed: 12,000 x g, Time: 15 min, Container: New tube
Add 2 volumes 100% ethanol, mix.{"action": "add", "output": "ethanol mixture", "volume": ["2 volumes"], "reagent": ["100% ethanol"]}, {"action": "mix", "output": "", "reagent": ["ethanol mixture"]}Add, MixVolume: 2 volumes
Add mix to MinElute spin column, centrifuge 15 s at 1,000 x g, discard flow-through, repeat until all sample is used.{"action": "add", "output": "", "reagent": ["ethanol mixture"], "container": ["MinElute spin column"]}, {"action": "centrifuge", "output": "", "speed": ["1,000 x g"], "time": ["15 s"]}, {"action": "discard", "output": "", "reagent": ["flow-through"]}, {"action": "repeat", "output": "", "condition": ["until all sample is used"]}Add, Centrifuge, Discard, RepeatContainer: MinElute spin column, Speed: 1,000 x g, Time: 15 s, Condition: Until all sample is used
Wash column with 700 µL Buffer RWT, centrifuge 15 s at ≥8,000.{"action": "wash", "output": "", "reagent": ["Buffer RWT"], "volume": ["700 µL"]}, {"action": "centrifuge", "output": "", "speed": [">=8,000"], "time": ["15 s"]}Wash, CentrifugeVolume: 700 µL, Speed: ≥8,000, Time: 15 s
Wash twice with 500 µL Buffer RPE, centrifuge 15 s at ≥8,000.{"action": "wash", "output": "RNase-free", "reagent": ["Buffer RPE"], "volume": ["500 µL"]}, {"action": "centrifuge", "output": "RNase-free", "speed": [">=8,000"], "time": ["15 s"]}Wash, CentrifugeVolume: 500 µL, Speed: ≥8,000, Time: 15 s
作者回复

What specific optimizations could be applied to reduce the computational requirements of the proposed framework?

Thanks for the question. Computational efficiency is always a topic of interest when evaluating new computational frameworks. Let us consider a new coming protocol with kk steps, with each step configured by a constant number of parameters, denoted as ϵ\epsilon. At the syntax level, the primary computation bottleneck arises during DSL program synthesis, where the EM Algorithm exhibits a worst-case complexity of O(ϵk)O(\epsilon^k). This is a highly conservative estimate, as mainstream optimization approaches can solve the EM much more efficiently. At the semantics level, the bottleneck occurs during reagent flow analysis, which consumes O(k2)O(k^2) complexity. Notably, only approximately 10% of the steps are included in the nested loop for reagent flow construction, as about 90% of the steps are linearly connected. At the execution level, the protocol execution model also exhibits O(k2)O(k^2) complexity, encompassing both forward and backward tracing. This can be optimized by replacing the full tracing strategy with a sliding window built upon the topological dependencies between steps.

Although the complexities of the algorithms at these three levels are tractable, there is substantial room for improving the efficiency of the framework. Investigating methods to speed up the translation process for protocols with extremely high complexity would be a valuable area of research. We are committed to making the computational framework as accessible as possible for all research teams.

In addition, our proposed framework functions as an auxiliary module for LLMs, supporting the use of off-the-shelf LLMs such as GPT and Llama without the need for domain-specific fine-tuning. Costs associated with calling commercial LLM APIs are quite affordable. We selected OpenAI's gpt-3.5-turbo-0125 model for our experiments. Across 75 test protocols, we executed 1816 queries to achieve syntax-level translation, resulting in structured protocols. At the semantic level, we conducted 4062 queries for completion tasks (including translating protocols retrieved from training dataset). Consequently, our expenditures were approximately 17 USD in total.

How does the system ensure the safety and correctness of translated protocols, especially in high-stakes domains such as medical and clinical research?

Thanks for the question. In general, ensuring the safety and correctness of translated protocols in high-stakes domains is an exceptionally challenging task. Several factors contribute to these challenges, including accurately mapping operations to their corresponding configuration parameters, precisely parsing control flows from natural language, completing latent semantics with domain-specific knowledge, inferring missing or omitted key information, tracking resource capacities, and verifying the safety of run-time execution of experiments. Even minor errors in these areas can significantly compromise the safety and correctness of translated protocols. Consequently, we have made specific efforts in response to these challenges.

At the syntax level, we synthesize the operation dependence graph to transform natural-language-based protocols into structured representations. This approach explicitizes operation-condition mappings and control flows. At the semantics level, we analyze the reagent flow graph to reconstruct the complete lifecycles of intermediate products, thereby addressing latent, missing, or omitted properties and values. At the execution level, we contextualize both the operation dependence graph and the reagent flow graph within spatial and temporal dynamics, resulting in a protocol dependence graph. This graph facilitates counterfactual reasoning to identify potential conflicts or shortages of execution resources and inappropriate combinations of operations within execution sequences.

We provide several illustrative examples to demonstrate these concepts (see C.2-1).

The paper could benefit from a more detailed analysis of the types of errors made by the automated translator compared to human experts, which would help understand the limitations and areas for improvement.

Thanks for the suggestion. Here we present a detailed analysis of the errors made by our proposed automatic translator compared to human experts. We discuss the potential improvements of the translator accordingly.

At the syntax level, the major difference being in the analysis of long sentences in natural language. Human experts analyze the parameters of events/actions or multiple actions in long sentences with ease, while for our approach, there are sometimes problems with the correspondence between action and parameter (see C.2-2-1).

At the semantic level, when supplementing known unknowns, human experts tend to infer parameters based on established protocols outside their expertise; when supplementing unknown unknowns, human experts tend to transfer their knowledge from familiar domains to protocols in various fields. Our system, however, completes parameters based on all collected protocols, which is essentially the opposite of the transfer process used by human experts (see C.2-2-2).

At the execution level, human experts track capacity primarily based on prior knowledge, subsequently using context to judge the appropriateness of the equipment used. In contrast, the machine extracts the entire flow process, enabling it to calculate each step and ensure that the capacity tracking is scientifically sound and reasonable (see C.2-2-3).

Can the authors provide more details on how the system handles ambiguous or incomplete protocol instructions that may be common in real-world scenarios?

Thanks for the question. Here, we present a series of case studies to elucidate the the specific behaviors of components within the proposed three-stage framework at the syntax, semantics, and execution levels (see C.2-3).

评论
original textw/ stage1, w/o stage2&3utility of stage1w/ stage1&2, w/o stage3utility of stage2w/ stage1&2&3utility of stage3
Transfer the clear supernatant to <MASK>. Incubate at 4 °C with rotation.{"action": "transfer", "output": "the clear supernatant", "container": ["<MASK>"]}; {"action": "incubate", "output": "", "temperature": ["4 °C"], "reagent":[]};{"in": ["the clear supernatant"], "out": ["<MASK>"]}; {"in": ["<MASK>"], "out": []};{"action": "transfer", "output": "the clear supernatant", "container": ["a new tube"]}; {"action": "incubate", "output": "", "temperature": ["4 °C"], "reagent":[]};Latent semantics of unknown unknowns (container);{"action": "transfer", "output": "the clear supernatant", "container": ["a new tube"]}; {"action": "incubate", "output": "", "temperature": ["4 °C"], "reagent":["the clear supernatant"]};Reagent: clear supernatant; No specific volume provided
Wash the cell pellet with 1x DPBS with 20 ng/ml murine M-CSF in a 100 mm Petri dish. Suspend in <MASK> complete DMEM medium.{"action": "wash", "output": "", "reagent": ["1x DPBS with 20 ng/ml murine M-CSF", "the cell pellet"], "container": ["a 100 mm Petri dish"]}; {"action": "suspend", "output": "", "volume": ["<MASK>"], "reagent": ["complete DMEM medium"]};{"in": ["the cell pellet", "1x DPBS with 20 ng/ml murine M-CSF"], "out": []}; {"in": ["complete DMEM medium"], "out": ["suspended cells"]};{"action": "wash", "output": "", "reagent": ["1x DPBS with 20 ng/ml murine M-CSF", "the cell pellet"], "container": ["a 100 mm Petri dish"]}; {"action": "suspend", "output": "", "volume": ["15 ml"], "reagent": ["complete DMEM medium"]};Latent semantics of unknown unknowns (volume);{"action": "wash", "output": "", "reagent": ["1x DPBS with 20 ng/ml murine M-CSF", "the cell pellet"], "container": ["a 100 mm Petri dish"]}; {"action": "suspend", "output": "", "volume": ["15 ml"], "reagent": ["complete DMEM medium"]};Reagent: 1x DPBS with 20 ng/ml murine M-CSF, complete DMEM medium (15 ml)
Divide the supernatants (soluble fractions). ... ... Measure the total protein concentration in both positive cell lysates using the BCA protein assay kit according to the manufacturer’s instructions.{"action": "divide", "output": "the supernatants (soluble fractions)"}; ... ... {"action": "measure", "concentration": ["total protein concentration"], "output": "", "reagent": ["the BCA protein assay kit", ""]};{"in": ["the supernatants (soluble fractions)"], "out": ["divided fractions"]}; {"in": ["positive cell lysates", "BCA protein assay kit"], "out": ["protein concentration"]};{"action": "divide", "output": "the supernatants (soluble fractions)"}; ... ... {"action": "measure", "concentration": ["total protein concentration"], "output": "", "reagent": ["the BCA protein assay kit", ""]};Latent semantics of unknown unknowns (concentration);{"action": "divide", "output": "the supernatants (soluble fractions)"}; ... ... {"action": "measure", "concentration": ["total protein concentration"], "output": "", "reagent": ["the BCA protein assay kit", "the supernatants (soluble fractions)"]};Reagent: supernatants (soluble fractions), BCA protein assay kit; No specific volume provided
Add 2.6 ml <MASK>. Incubate cells at 37°C for 24-48 h.{"action": "add", "output": "", "volume": ["2.6 ml"], "reagent": ["<MASK>"]}; {"action": "incubate", "output": "", "reagent": ["cells"], "temperature": ["37°C"], "time": ["24-48 h"]};{"in": ["<MASK>"], "out": ["treated cells"]}; {"in": ["treated cells"], "out": []};{"action": "add", "output": "", "volume": ["2.6 ml"], "reagent": ["fresh culture medium"]}; {"action": "incubate", "output": "", "reagent": ["cells"], "temperature": ["37°C"], "time": ["24-48 h"]};Latent semantics of unknown unknowns (reagent);{"action": "add", "output": "cells", "volume": ["2.6 ml"], "reagent": ["fresh culture medium"]}; {"action": "incubate", "output": "", "reagent": ["cells"], "temperature": ["37°C"], "time": ["24-48 h"]};Reagent: fresh culture medium (2.6 ml)
评论
original textw/ stage1, w/o stage2&3utility of stage1w/ stage1&2, w/o stage3utility of stage2w/ stage1&2&3utility of stage3
Kill most the contaminating spores that have germinated. Centrifuge the spore mixture at <MASK> for 5 min.{"action": "eliminate", "output": "", "reagent": ["the contaminating spores that have germinated"]}; {"action": "centrifuge", "force": ["<MASK>"], "output": "", "reagent": ["the spore mixture"], "time": ["5 min"]};{"in": ["the contaminating spores that have germinated"], "out": []}; {"in": ["the spore mixture"], "out": []};{"action": "eliminate", "output": "", "reagent": ["the contaminating spores that have germinated"]}; {"action": "centrifuge", "force": ["1,200 x g"], "output": "", "reagent": ["the spore mixture"], "time": ["5 min"]};Latent semantics of unknown unknowns (force);{"action": "eliminate", "output": "the spore mixture", "reagent": ["the contaminating spores that have germinated"]}; {"action": "centrifuge", "force": ["1,200 x g"], "output": "", "reagent": ["the spore mixture"], "time": ["5 min"]};Reagent: spore mixture; No specific volume provided
Add pre-hybr soln directly to the hybridization reaction to get hybrid molecule. Incubate for <MASK>.{"action": "add", "output": "hybrid molecule", "reagent": ["pre-hybr soln", "the hybridization reaction"]}; {"action": "incubate", "output": "", "reagent": [""], "temperature": [""], "time": ["<MASK>"]};{"in": ["pre-hybr soln", "the hybridization reaction"], "out": ["hybrid molecule"]}; {"in": ["hybrid molecule"], "out": []};{"action": "add", "output": "hybrid molecule", "reagent": ["pre-hybr soln", "the hybridization reaction"]}; {"action": "incubate", "output": "", "reagent": [""], "temperature": [""], "time": ["10 mins"]};Latent semantics of known unknowns (time);{"action": "add", "output": "hybrid molecule", "reagent": ["pre-hybr soln", "the hybridization reaction"]}; {"action": "incubate", "output": "", "reagent": ["hybrid molecule"], "temperature": [""], "time": ["10 mins"]};Reagent: pre-hybridization solution, hybridization reaction; No specific volume provided
Confirm positive colonies by PCR. Take fluorescence images under <MASK>.{"action": "confirm", "device": ["PCR"], "output": "positive colonies", "reagent": [""]}; {"action": "take", "device": ["<MASK>"], "output": ["fluorescence images"]};{"in": ["PCR"], "out": ["positive colonies"]}; {"in": ["positive colonies"], "out": ["fluorescence images"]};{"action": "confirm", "device": ["PCR"], "output": "positive colonies", "reagent": [""]}; {"action": "take", "device": ["microscope"], "output": ["fluorescence images"]};Latent semantics of unknown unknowns (device);{"action": "confirm", "device": ["PCR"], "output": "positive colonies", "reagent": ["RNAs"]}; {"action": "take", "device": ["microscope"], "output": ["fluorescence images"]};Reagent: RNAs; No specific volume provided
Transfer the flow to a PLG tube. Incubate at <MASK> for 5 minutes. Add 90 µl chloroform.{"action": "transfer", "output": "", "container": ["a PLG tube"], "reagent": ["the flow"]}; {"action": "incubate", "output": "", "temperature": ["<MASK>"], "time": ["5 minutes"]}; {"action": "add", "output": "", "volume": ["90 µl"], "reagent": ["chloroform"]};{"in": ["the flow"], "out": ["PLG tube"]}; {"in": ["PLG tube"], "out": []}; {"in": ["90 µl chloroform"], "out": []};{"action": "transfer", "output": "", "container": ["a PLG tube"], "reagent": ["the flow"]}; {"action": "incubate", "output": "", "temperature": ["room temperature"], "time": ["5 minutes"]}; {"action": "add", "output": "", "volume": ["90 µl"], "reagent": ["chloroform"]};Latent semantics of unknown unknowns (temperature);{"action": "transfer", "output": "", "container": ["a PLG tube"], "reagent": ["the flow"]}; {"action": "incubate", "output": "", "temperature": ["room temperature"], "time": ["5 minutes"]}; {"action": "add", "output": "", "volume": ["90 µl"], "reagent": ["chloroform"]};Reagent: flow, chloroform (90 µl)
审稿意见
5

The work identifies the problem of translating from natural language instructions for scientific experiments to machine usable formats and frames it as a program synthesis problem. The proposed approach uses language models (along with other parsing techniques) to extract a structured sequence of instructions from natural language inputs which are then verified using an execution model. This approach is compared against expert translations as well as constraint decoding and prompt engineering baselines.

优点

The paper identifies an interesting problem in the AI for science Domain. It shows that the decomposition of the problem into syntax and semantics can be mapped to operations and reagent flow which is a useful insight. The paper further introduces useful formalism in the form of the PDG and algorithms to synthesise programs in DSLs. The paper evaluates on multiple datasets against reasonable benchmarks, showing the superiority of the proposed approach.

缺点

The use of BLEU and ROUGE scores as metrics to compare expert and machine generated instructions is not properly justified. It may infact be problematic considering that both metrics measure textual similarity however, instructions that looks similar could have very different semantics (for example: "... pour hot water ..." vs "... pour cold water ...")

The paper does not provide any insight into what each part of the system contributes to the final effectiveness. It would be especially valuable to understand (1) how much language model-based parsing makes a difference and (2) how the constraints imposed by the synthesis prevent the model from incorrect solutions (vs pure prompt engineering baselines) but also provide enough flexibility to do better than standard constraint decoding.

问题

Why is BLEU/ROUGE used as a metric? Can a more semantically aligned mode of comparison be used here.

Qualitative, how does the approach behave differently to the baselines? What does each component of the system contribute?

局限性

The evaluation presented in the paper considers datasets where the input is descriptions of experiments in standardised scientific terminology / format. While this demonstrates the system's usefulness in translating such inputs, it is unclear how well it may generalise to inputs following looser terminology / formatting. Since the tool aims to save time for domain experts, it should be demonstrated how much it is easier to write down instructions in this format as compared to the machine usable one directly.

评论

Here we provide a series of case studies to illustrate the distinctions between the behaviors of the components within our proposed three-stage framework and those of the baselines qualitatively.

original textw/ stage1, w/o stage2&3utility of stage1w/ stage1&2, w/o stage3utility of stage2w/ stage1&2&3utility of stage3
Kill most the contaminating spores that have germinated. Centrifuge the spore mixture at <MASK> for 5 min.{"action": "eliminate", "output": "", "reagent": ["the contaminating spores that have germinated"]}; {"action": "centrifuge", "force": ["<MASK>"], "output": "", "reagent": ["the spore mixture"], "time": ["5 min"]};{"in": ["the contaminating spores that have germinated"], "out": []}; {"in": ["the spore mixture"], "out": []};{"action": "eliminate", "output": "", "reagent": ["the contaminating spores that have germinated"]}; {"action": "centrifuge", "force": ["1,200 x g"], "output": "", "reagent": ["the spore mixture"], "time": ["5 min"]};Latent semantics of unknown unknowns (force);{"action": "eliminate", "output": "the spore mixture", "reagent": ["the contaminating spores that have germinated"]}; {"action": "centrifuge", "force": ["1,200 x g"], "output": "", "reagent": ["the spore mixture"], "time": ["5 min"]};Reagent: spore mixture; No specific volume provided
Add pre-hybr soln directly to the hybridization reaction to get hybrid molecule. Incubate for <MASK>.{"action": "add", "output": "hybrid molecule", "reagent": ["pre-hybr soln", "the hybridization reaction"]}; {"action": "incubate", "output": "", "reagent": [""], "temperature": [""], "time": ["<MASK>"]};{"in": ["pre-hybr soln", "the hybridization reaction"], "out": ["hybrid molecule"]}; {"in": ["hybrid molecule"], "out": []};{"action": "add", "output": "hybrid molecule", "reagent": ["pre-hybr soln", "the hybridization reaction"]}; {"action": "incubate", "output": "", "reagent": [""], "temperature": [""], "time": ["10 mins"]};Latent semantics of known unknowns (time);{"action": "add", "output": "hybrid molecule", "reagent": ["pre-hybr soln", "the hybridization reaction"]}; {"action": "incubate", "output": "", "reagent": ["hybrid molecule"], "temperature": [""], "time": ["10 mins"]};Reagent: pre-hybridization solution, hybridization reaction; No specific volume provided
Confirm positive colonies by PCR. Take fluorescence images under <MASK>.{"action": "confirm", "device": ["PCR"], "output": "positive colonies", "reagent": [""]}; {"action": "take", "device": ["<MASK>"], "output": ["fluorescence images"]};{"in": ["PCR"], "out": ["positive colonies"]}; {"in": ["positive colonies"], "out": ["fluorescence images"]};{"action": "confirm", "device": ["PCR"], "output": "positive colonies", "reagent": [""]}; {"action": "take", "device": ["microscope"], "output": ["fluorescence images"]};Latent semantics of unknown unknowns (device);{"action": "confirm", "device": ["PCR"], "output": "positive colonies", "reagent": ["RNAs"]}; {"action": "take", "device": ["microscope"], "output": ["fluorescence images"]};Reagent: RNAs; No specific volume provided
Transfer the flow to a PLG tube. Incubate at <MASK> for 5 minutes. Add 90 µl chloroform.{"action": "transfer", "output": "", "container": ["a PLG tube"], "reagent": ["the flow"]}; {"action": "incubate", "output": "", "temperature": ["<MASK>"], "time": ["5 minutes"]}; {"action": "add", "output": "", "volume": ["90 µl"], "reagent": ["chloroform"]};{"in": ["the flow"], "out": ["PLG tube"]}; {"in": ["PLG tube"], "out": []}; {"in": ["90 µl chloroform"], "out": []};{"action": "transfer", "output": "", "container": ["a PLG tube"], "reagent": ["the flow"]}; {"action": "incubate", "output": "", "temperature": ["room temperature"], "time": ["5 minutes"]}; {"action": "add", "output": "", "volume": ["90 µl"], "reagent": ["chloroform"]};Latent semantics of unknown unknowns (temperature);{"action": "transfer", "output": "", "container": ["a PLG tube"], "reagent": ["the flow"]}; {"action": "incubate", "output": "", "temperature": ["room temperature"], "time": ["5 minutes"]}; {"action": "add", "output": "", "volume": ["90 µl"], "reagent": ["chloroform"]};Reagent: flow, chloroform (90 µl)
评论

Imagine a self-driving kitchen that automatically prepares all ingredients and executes all procedures for cooking a meal according to natural-language-based recipes. Such self-driving kitchens would also benifit significantly from translating human-oriented recipes into formats suitable for machine execution. In the following, we present a running example of such a translation, adapted from [1].

The protocol after pre-processing is as follows.

Pasta Bolognese

Yield: 2 plates

Ingredients:

- 8 [ounces] white fresh {pasta}

- 1 [floz] olive {oil}

- 1/4 [ounce] {garlic}; minced

- 4 [ounces] {onions}; chopped

- 4 [ounces] shallow fried {beef}; minced

- 1 - 1 1/2 [ounce] lean prepared {bacon}

- 1/3 [cup] red {wine}

- 150 [gram] raw {carrots}; thinly sliced

- 2/3 [ounce] concentrated {tomato puree}

- 4 [ounces] red {sweet pepper}; cut julienne

- 1 [ounce] {parmesan} cheese

Instructions:

Add the @oil@ to a large saucepan, heat to <300 F>, and saute the @onions@. 

After |2 minutes|, add the @garlic@. Keep on medium to high heat, and don't stir. 

After |2 minutes| more, add the @beef@.

Fry the @bacon@ in a separate pan, on high heat. Remove liquified fat when done.

Boil @pasta@ in a medium pan, until al dente (~|8 minutes|). Drain when done.

Once the @beef@ is done, add the @carrots@, @sweet pepper@ and @tomato puree@. 

Slowly add the @wine@ as well, to not lower the temperature. Let it simmer (but not boil) for |5-10 minutes|. 

Given the protocol as the input of our framework, the resulting DSL program is as follows.

add(slot = "oil", target = "large saucepan", container = plate_1, emit = mixture_1);

heat(target = mixture_1, temperature = 300F, container = plate_1, postcon = stop());

saute(target = "onions", container = plate_2, duration = 2mins);

add(slot = "garlic", target = mixture_1, container = plate_1, emit = mixture_2);

heat(target = mixture_2, temperature = 325F, container = plate_1, duration = 2mins);

add(slot = "beef", target = mixture_2, container = plate_1, emit = mixture_3);

heat(target = mixture_2, temperature = 325F, container = plate_1, postcond = check_done(target = "beef"));

fry(target = "bacon", temperature = 350F, container = pan_1, postcond = remove(target = "liquified fat"));

boil(target = "pasta", temperature = 212F, container = pan_2, duration = 8mins, postcond = drain());

add(precond = check_done(target = "beef"), slot = ["carrots", "sweet pepper", "tomato puree"], target = mixture_3, container = plate_1, emit = mixture_4);

add(slot = "wine", target = mixture_4, container = plate_1, pace = 1mL/s);

simmer(target = mixture_4, temperature = 211F, duration = 7.5mins);

In this example, we observe that the natural-language-based recipe possesses ambiguities and omissions. Our translation framework addresses these challenges by structuring the recipe at the syntax level, completing the latent information at the semantics level, and linking the programs with necessary resources, such as the usage of plates, at the execution level.

References:

[1] Roorda, Auke. "Corel: A DSL for Cooking Recipes." Diss. 2021.

作者回复

Why is BLEU/ROUGE used as a metric? Can a more semantically aligned mode of comparison be used here.

This is a very good question. The same concern was considered during the development of our evaluation methodology. Direct comparisons across entire sentences under BLEU/ROUGE scores would indeed pose a problem as the reviewer mentioned. Therefore, to circumvent this issue, we convert all results into a standardized JSON-style format for data representation, and comparisons are made between key-value pairs rather than entire sentences, effectively resolving the metric concern.

Let us consider the example mentioned by the reviewer: we represent the two sentences "... pour hot water ..." and "... pour cold water ..." in the following JSON-style format.

{
 op: "pour",
 reg: "water",
 T: "hot",
}

{
 op: "pour",
 reg: "water",
 T: "cold",
}

The comparison between the two sentences is then transformed into a comparison between two JSON code blocks. We calculate the similarity score cumulatively based on the similarity between the values of matched pairs of keys. For instance, for the key "temperature", the values "hot" and "cold" yield a low similarity score under the ROUGH, BLEU, and even the Exact Match metrics. As "temperature" is one of the major keys within configuration parameters, a high penalty in this dimension significantly affects the cumulative similarity score. With this fine-grained comparison metric, we can comprehensively track the distinctions and commonalities between results without losing expressivity regarding the quantities.

We also acknowledge that there are advanced evaluation metrics, especially in the recent works where LLMs are leveraged as external judges and achieve considerable performance in general testing cases. Our choice of "less advanced" metrics is driven by the intention to focus specifically on domain-specific knowledge, which constitutes the primary scope of this paper and may be relatively sparse in general LLMs. Nonetheless, the exploration of more sophisticated evaluation metrics represents a promising avenue for future research, and we appreciate the reviewer's perceptive recommendation in this regard.

Since the tool aims to save time for domain experts, it should be demonstrated how much it is easier to write down instructions in this format as compared to the machine usable one directly.

Thanks for the comment. The scope of our proposed framework, in the current stage, is to automatically translate human-oriented protocols into formats suitable for machine execution, rather than helping domain experts creating new ones in an easier format. The goal is to transfer knowledge in a conventional lab into a format suitable for self-driving labs. Therefore, the human-oriented protocols used in this translation are existing protocols previously designed for human operators in the conventional labs, thus coming with no extra cost.

In contrast to conventional protocol translation processes, which require domain experts to manually develop rules and functions based on specialized knowledge, our proposed automatic translator attempts to eliminate the need for such expert intervention. Domain experts are not involved in either the development or the execution stages of our translator. Therefore, the evaluations in our paper mainly focus on translation performance rather than generation efficiency. We appreciate the reviewer's insightful suggestion and will make revisions accordingly for better clarification.

While this demonstrates the system's usefulness in translating such inputs, it is unclear how well it may generalise to inputs following looser terminology / formatting.

Thanks for the comment. The general applicability of our proposed framework beyond experimental sciences can indeed be a common concern. The core value of translating natural-language-based protocols into formats suitable for machine execution substantially lies in facilitating experiments in self-driving labs, thereby accelerating scientific discovery. Experimental protocols come with unique properties and challenges, such as the fine-grained incorporation of domain-specific knowledge, the non-trivial dependency topology between operations, the long-horizon lifecycles of intermediate productions, and the necessity for precise execution without run-time errors. These factors shape the scope of our research problem, emphasising the need to handle protocols with stringent terminology and formatting.

Despite the specific scope of this paper, we are open to exploring the potential for generalizing our framework to other domains with similar challenges as those found in scientific experiments, such as cooking (see C.1-1).

Qualitative, how does the approach behave differently to the baselines? What does each component of the system contribute?

Thanks for the question. The rationales for the components within our proposed framework are grounded in both empirical and theoretical considerations. We develop the three-stage framework that integrates cognitive insights from human experts with approaches from program synthesis, automata construction, and counterfactual analysis. At the syntax level, we synthesize the operation dependence graph to transform natural-language-based protocols into structured representations, thereby making explicit the operation-condition mappings and the control flows. At the semantics level, we analyze the reagent flow graph to reconstruct the complete lifecycles of intermediate products, addressing the latent, missing, or omitted properties and values. At the execution level, we contextualize both the operation dependence graph and the reagent flow graph within spatial and temporal dynamics, resulting in the protocol dependence graph. This graph conducts counterfactual reasoning to detect potential conflicts or shortages of execution resources and to identify inappropriate combinations of operations in execution sequences (see C.1-2).

评论

Thanks to the authors for their detailed response. I can see how using BLEU/ROUGE over JSON structured outputs can alleviate the concerns mentioned in my review. The examples of outputs at each stage is also appreciated.

My major remaining concern is with regards to the relevance of the paper to the ML community. As mentioned in my review, including a more detailed discussion of (1) how much language model-based parsing makes a difference and (2) how the constraints imposed by the synthesis prevent the model from incorrect solutions (vs pure prompt engineering baselines) but also provide enough flexibility to do better than standard constraint decoding might the paper of interest to the broader ML community.

评论
original textw/ stage1, w/o stage2&3utility of stage1w/ stage1&2, w/o stage3utility of stage2w/ stage1&2&3utility of stage3
Transfer the clear supernatant to <MASK>. Incubate at 4 °C with rotation.{"action": "transfer", "output": "the clear supernatant", "container": ["<MASK>"]}; {"action": "incubate", "output": "", "temperature": ["4 °C"], "reagent":[]};{"in": ["the clear supernatant"], "out": ["<MASK>"]}; {"in": ["<MASK>"], "out": []};{"action": "transfer", "output": "the clear supernatant", "container": ["a new tube"]}; {"action": "incubate", "output": "", "temperature": ["4 °C"], "reagent":[]};Latent semantics of unknown unknowns (container);{"action": "transfer", "output": "the clear supernatant", "container": ["a new tube"]}; {"action": "incubate", "output": "", "temperature": ["4 °C"], "reagent":["the clear supernatant"]};Reagent: clear supernatant; No specific volume provided
Wash the cell pellet with 1x DPBS with 20 ng/ml murine M-CSF in a 100 mm Petri dish. Suspend in <MASK> complete DMEM medium.{"action": "wash", "output": "", "reagent": ["1x DPBS with 20 ng/ml murine M-CSF", "the cell pellet"], "container": ["a 100 mm Petri dish"]}; {"action": "suspend", "output": "", "volume": ["<MASK>"], "reagent": ["complete DMEM medium"]};{"in": ["the cell pellet", "1x DPBS with 20 ng/ml murine M-CSF"], "out": []}; {"in": ["complete DMEM medium"], "out": ["suspended cells"]};{"action": "wash", "output": "", "reagent": ["1x DPBS with 20 ng/ml murine M-CSF", "the cell pellet"], "container": ["a 100 mm Petri dish"]}; {"action": "suspend", "output": "", "volume": ["15 ml"], "reagent": ["complete DMEM medium"]};Latent semantics of unknown unknowns (volume);{"action": "wash", "output": "", "reagent": ["1x DPBS with 20 ng/ml murine M-CSF", "the cell pellet"], "container": ["a 100 mm Petri dish"]}; {"action": "suspend", "output": "", "volume": ["15 ml"], "reagent": ["complete DMEM medium"]};Reagent: 1x DPBS with 20 ng/ml murine M-CSF, complete DMEM medium (15 ml)
Divide the supernatants (soluble fractions). ... ... Measure the total protein concentration in both positive cell lysates using the BCA protein assay kit according to the manufacturer’s instructions.{"action": "divide", "output": "the supernatants (soluble fractions)"}; ... ... {"action": "measure", "concentration": ["total protein concentration"], "output": "", "reagent": ["the BCA protein assay kit", ""]};{"in": ["the supernatants (soluble fractions)"], "out": ["divided fractions"]}; {"in": ["positive cell lysates", "BCA protein assay kit"], "out": ["protein concentration"]};{"action": "divide", "output": "the supernatants (soluble fractions)"}; ... ... {"action": "measure", "concentration": ["total protein concentration"], "output": "", "reagent": ["the BCA protein assay kit", ""]};Latent semantics of unknown unknowns (concentration);{"action": "divide", "output": "the supernatants (soluble fractions)"}; ... ... {"action": "measure", "concentration": ["total protein concentration"], "output": "", "reagent": ["the BCA protein assay kit", "the supernatants (soluble fractions)"]};Reagent: supernatants (soluble fractions), BCA protein assay kit; No specific volume provided
Add 2.6 ml <MASK>. Incubate cells at 37°C for 24-48 h.{"action": "add", "output": "", "volume": ["2.6 ml"], "reagent": ["<MASK>"]}; {"action": "incubate", "output": "", "reagent": ["cells"], "temperature": ["37°C"], "time": ["24-48 h"]};{"in": ["<MASK>"], "out": ["treated cells"]}; {"in": ["treated cells"], "out": []};{"action": "add", "output": "", "volume": ["2.6 ml"], "reagent": ["fresh culture medium"]}; {"action": "incubate", "output": "", "reagent": ["cells"], "temperature": ["37°C"], "time": ["24-48 h"]};Latent semantics of unknown unknowns (reagent);{"action": "add", "output": "cells", "volume": ["2.6 ml"], "reagent": ["fresh culture medium"]}; {"action": "incubate", "output": "", "reagent": ["cells"], "temperature": ["37°C"], "time": ["24-48 h"]};Reagent: fresh culture medium (2.6 ml)
作者回复

We thank all reviewers for their time and valuable comments. The feedback is both substantial and helpful for improving our paper. In this work, we systematically study the problem of translating experimental protocols for human to those suitable for self-driving laboratories. Accordingly, we propose a three-stage workflow that incrementally constructs Protocol Dependence Graphs at the syntax, semantics, and execution levels. Our qualitative and quantitative results underscore the framework's potential to accelerate and democratize the process of scientific discovery.

We would like to thank the reviewers for acknowledging our work to be:

  1. The paper identifies "an interesting problem in the AI for science Domain" (reviewer #RqLD), "is well-motivated and of great significance in advancing AI applications in scientific discovery" (reviewer #uEvE), and addresses "a critical gap in the transition from AI-driven discoveries to empirical experimentation" (reviewer #AwgV).
  2. The proposed method, which is a "novel, automated approach to protocol translation for self-driving laboratories" (reviewer #AwgV), provides "a useful insight" regarding "the decomposition of the problem into syntax and semantics" (reviewer #RqLD), and "further introduces useful formalism in the form of the PDG and algorithms to synthesise programs in DSLs" (reviewer #RqLD).
  3. The evaluations are conducted on "multiple datasets against reasonable benchmarks" (reviewer #RqLD), showing that "the proposed approach outperforms pure LLM-based synthesis and matches the manual translation by human experimenters" (reviewer #uEvE).
  4. The paper is "well-structured and clearly written" (reviewer #AwgV), and "is easy to follow, although the required background knowledge is non-trivial" (reviewer #uEvE).

Based on the reviewers' comments, we made revisions including:

  1. Clarifying certain concepts to enhance the paper's accessibility for readers with a background outside experimental sciences.
  2. Demonstrating running examples of the behaviors of different components within our proposed three-stage framework in detail to make the paper more comprehensive.
  3. Conducting additional analyses and discussions regarding the computational complexity, safety, and theoretical foundation of our proposed framework to make the paper more rigorous and self-consistent.

In the following, we address specific questions for each reviewer.

最终决定

The paper presents a method for integrating AI into the scientific discover process in the form of a framework that translates scientific protocols written in natural language into a structured form that could conceivably be executed by a self-driving laboratory. The approach is validated through quantitative and qualitative evaluations.

The paper appears well written and the idea novel and well motivated. Results show that the synthesized protocol translation matches manually written ones by human experimenters. Identified weaknesses are that the proposed solution is a portfolio of standard applications of existing tools or well-known algorithms, and this may be less interesting to the ML community. The method also requires substantial computational resources which might limit its accessibility.