U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Backbone-independent NMR resonance assignments of methyl probes in large proteins

Santrupti nerli.

1 Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064 USA

2 Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, The Children’s Hospital of Philadelphia and Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, 3501 Civic Center Boulevard, Philadelphia, PA 19104 USA

Viviane S. De Paula

3 Núcleo Multidisciplinar de Pesquisa em Biologia, Universidade Federal do Rio de Janeiro, Duque de Caxias, RJ 25245-390 Brazil

Andrew C. McShan

Nikolaos g. sgourakis, associated data.

Nuclear Magnetic Resonance assignments for IL-2, REC2, HNH, and REC3, used as blind targets, have been deposited in the BMRB under accession numbers 28104, 28105, 28106, and 28110, respectively. The unprocessed and processed NMR data along with the structural ensembles and input parameters used to run MAUS are provided for all targets in the help page of the MAUS website. Other data are available from the corresponding author upon reasonable request.

The MAUS server is available at https://methylassignment.chemistry.ucsc.edu .

Methyl-specific isotope labeling is a powerful tool to study the structure, dynamics and interactions of large proteins and protein complexes by solution-state NMR. However, widespread applications of this methodology have been limited by challenges in obtaining confident resonance assignments. Here, we present Methyl Assignments Using Satisfiability (MAUS), leveraging Nuclear Overhauser Effect cross-peak data, peak residue type classification and a known 3D structure or structural model to provide robust resonance assignments consistent with all the experimental inputs. Using data recorded for targets with known assignments in the 10–45 kDa size range, MAUS outperforms existing methods by up to 25,000 times in speed while maintaining 100% accuracy. We derive de novo assignments for multiple Cas9 nuclease domains, demonstrating that the methyl resonances of multi-domain proteins can be assigned accurately in a matter of days, while reducing biases introduced by manual pre-processing of the raw NOE data. MAUS is available through an online web-server.

Here, the authors present Methyl Assignments Using Satisfiability (MAUS), a method for the assignment of methyl groups using raw NOE data. They use eight proteins in the 10–45 kDa size range as test cases and show that MAUS yields 100% accurate assignments at high completeness levels.

Introduction

The use of methyl probes has opened new avenues for the application of nuclear magnetic resonance (NMR) methods to study large molecular machines 1 . The signal enhancement offered by methyl-transverse relaxation optimized spectroscopy (TROSY) techniques 2 and a suite of experiments for quantitative characterization of protein dynamics occurring over a broad range of timescales 3 have rendered methyl-based NMR a formidable tool for detailed mechanistic studies of important biological systems 4 . The main bottleneck in applications of methyl-based NMR is obtaining confident resonance assignments. In the conventional approach, backbone assignments are first established using triple-resonance experiments 5 . Then, methyl resonances are connected to the backbone using either methyl out-and-back experiments 6 or, more commonly, using 15 N and 13 C edited amide-to-methyl nuclear Overhauser effect (NOE) measurements 7 . However, in the absence of previously established backbone assignments, deriving confident assignments for methyl resonances remains a challenge. Although site-directed mutagenesis of individual methyl-bearing residues 8 provides unambiguous assignments, the laborious, costly, and time-consuming nature of this approach limits applications to study larger, multi-domain proteins.

Recently, several automated methyl assignment methods have been proposed 9 – 12 . To circumvent the need for existing backbone assignments, methyl NOE data and a known structure of the target protein can be used to derive a set of possible assignments, by fitting local NOE networks to methyl distances derived from the three-dimensional (3D) structure. In general, these methods start from fitting sub-sets of NOE connectivities to local clusters of methyls in the structure and then expand those to derive self-consistent assignments for the remaining methyl resonances. The NOE peak intensities are then used in an optimization process aiming to further reduce the space of solutions. However, any method that uses local information is bound to make mistakes globally and, as a result, most of these methods can miss the correct (ground truth) assignments from their scope of solutions, yielding a significant (up to 55%) error rate. An alternative approach is to perform an exhaustive mapping of the global NOE network to the target structure. This strategy was first outlined in the method MAGMA 13 , which treats methyl assignments as a maximum subgraph matching problem, and invokes a graph theory algorithm 14 to enumerate all assignments, which satisfy the maximum number of NOE connectivities. If the input contains only true positive NOE data (i.e., the NOE data graph is a subgraph of the structure graph), then MAGMA resorts to the VF2 algorithm 15 . Alternatively, maximum subgraph matching provides a reasonable compromise to account for the minimum number of false-positive NOEs, with the caveat that the ground truth solution may entail additional misidentified NOEs in the input. Moreover, MAGMA requires users to provide an annotated data graph, derived through an extensive, manual analysis of 3D or four-dimensional (4D) NOE spectroscopy (NOESY) peak data, which introduces an additional processing step, limiting its use by non-experts.

In the present work, we describe an automated system (MAUS: Methyl Assignments Using Satisfiability), which first formulates a set of rules, and then provides a compact description of all assignment possibilities that are consistent with these rules. Specifically, MAUS generates a structure graph, G , representing all methyl NOE connectivities present in an input Protein Data Bank (PDB) structure or structural model of the protein of interest, and multiple independent data graphs, H , containing all possible NOE networks, which can be derived from a list of raw 3D or 4D NOESY peaks. The NOE network is supplemented with residue type, stereospecificity, and geminal methyl connectivity constraints. Then, MAUS leverages an efficient algorithm to determine all valid ways of mapping every H into G (termed subgraph isomorphism), which respects all the experimental inputs. We test our method on a benchmark set of protein targets in the 10–45 kDa size range and show that MAUS maintains a robust performance, providing 100% accurate assignments at high levels of completeness, while offering a significant performance advantage relative to existing methods using the same inputs. Using MAUS, the methyl resonances of large, multi-domain proteins can be assigned accurately in a matter of days, completely bypassing the need for more laborious backbone-based NMR spectroscopy approaches.

Results and discussion

Methyl assignments as a subgraph isomorphism problem.

Rather than treating the methyl assignments as a maximum subgraph matching problem 13 , MAUS models the NOE data as a sparse sample of all possible connectivities present in the input structure. MAUS uses the NOE network together with additional experimental inputs, such as peak residue type information and geminal methyl resonance connectivities, to build a system of hard constraints. The constraints outline a subgraph isomorphism problem of fitting a sparse data graph H , into the original structure graph G (see “Methods” and Fig.  1a ).

An external file that holds a picture, illustration, etc.
Object name is 41467_2021_20984_Fig1_HTML.jpg

a Iterative workflow of the MAUS system. b Description of a structure graph, G with (~50–150) methyls as nodes. The edges of G correspond to all possible short-range (blue; up to 6–8 Å), long-range (black; from 6 up to 10 Å), and geminal methyl connectivities (green). c The 2D projections of 3D or 4D NOESY peaks are clustered (within tolerances; gray circles) to 2D HMQC reference peak positions (P1…P8). A NOESY peak can be clustered uniquely (red) or ambiguously (green and blue; overlapping peaks), leading to ~2 30 clustering combinations for typical 3D C M -C M H M NOESY data. d The observed chemical shift coordinates (C 1 ,C 2 ,H 2 ) of all NOE peaks are used to construct a symmetrization graph, S , with partitions representing upper (C 1  > C 2 ) and lower (C 2  > C 1 ) diagonal cross-peaks. S has nodes represented by short-range (blue) and long-range (black) NOEs, and edges connecting potentially symmetric NOE peaks. S has components of sizes 1 (no symmetry or NS), 2 and 3 (simple), and >3 (complex, CC), producing ~2 14 possibilities in the case of data recorded for maltose-binding protein, MBP. Simple components are used as constraints whereas complex components are reduced using an iterative process within MAUS: the maximum matchings (MMs) of each complex component are tested for satisfiability; satisfiable edges are retained in S . e A N-ary tree showing data graphs ( H 1 ,H 2 ,H 3 ,…,H n ) generated from all clustering and symmetrization alternatives (~2 44 possibilities); a majority of these data graphs do not lead to satisfying assignments and are eliminated by MAUS (red circles). Each remaining H can be mapped in ~2 100 possible ways onto G ; all valid methyl resonance assignments are enumerated and presented to the user in a compact form.

To fully account for all methyl connectivities consistent with the input structure and spin diffusion effects 16 , alternative side-chain rotamers are modeled using the program Rosetta 17 and maximum distances of 8 and 10 Å are applied to derive a structure graph, G , containing all possible short- and long-range NOEs, respectively (Fig.  1b ). In addition, MAUS explicitly considers (i) all possible mappings of 3D NOE cross-peaks to two-dimensional (2D) reference peaks, or clusters (Fig.  1c ), and (ii) all possible matchings between upper-diagonal and lower-diagonal NOE cross-peaks, formally analyzed as connected components of a bipartite symmetrization graph (Fig.  1d ). This approach relieves the user from the burden of interpreting the raw data (3D or 4D NOE peaks), through an explicit and exhaustive consideration of all possible data graphs consistent with the input NOE peaks (Fig.  1e ).

MAUS leverages a special-purpose constraint satisfaction solver (SAT) to enumerate all valid assignments using an iterative ansatz (see “Methods”). Using 3275 simulations of NOE data and structure graphs from a non-redundant set of 147 protein structures, we show that, relative to the VF2 algorithm 15 , SAT maintains a robust performance for problems of different sizes, topologies, and data sparsity levels, delivering accurate assignments in a matter of seconds (Supplementary Fig.  1 ). Finally, using the MAGMA benchmark set of eight targets 13 with the identical structure and data graphs provided to both programs, we find that, although MAGMA is marginally faster than MAUS for targets, which can be solved relatively quickly by both methods, MAUS maintains a consistent performance across all targets, including larger, more complex cases such as the 81.4 kDa maltose synthase G (Supplementary Table  1 , Supplementary Results ).

MAUS workflow and results on targets with known assignments

We tested MAUS using representative data sets recorded for a benchmark set of four protein targets spanning a range of sizes, folds and domain complexities: human β 2 -microglobulin (Hβ 2 m; 12 kDa, all-β-fold, single domain), maltose-binding protein (MBP; 41 kDa, all-α-fold, two-domain), and two major histocompatibility complex class-I (MHC-I) molecules of divergent heavy-chain sequences (HLA-A01; 45.5 kDa and HLA-A02; 44.8 kDa, mixed α/β-fold, three-domain) (Table  1 ). The X-ray structures and ground truth assignments for these targets were obtained from the PDB and BMRB, respectively. To obtain a consistent set of experimental data for all targets, we prepared 13 C/ 1 H (MA)ILV-methyl-labeled samples and acquired one 2D reference 1 H- 13 C heteronuclear multiple quantum coherence (HMQC) and two 3D 13 C M - 13 C M 1 H M NOESY-HMQC spectra, recorded with short (50 ms) and long (300 ms) mixing times (Fig.  2 , Supplementary Fig.  2 , and Supplementary Table  2 ). Besides providing supplementary short-range (<6–8 Å) constraints, the short mixing time NOEs help reduce complex components of the symmetrization graph (Fig.  1d ) and allow determination of geminal connectivities for resonances corresponding to 13 Cγ 1 / 13 Cγ 2 and 13 Cδ 1 / 13 Cδ 2 methyls of Val and Leu residues, respectively. Finally, towards reducing spectral overlap in the 2D reference spectra of larger (>20 kDa) targets, we prepared proS-labeled samples, stereospecifically defining the resonances of Leu/Val methyls (Fig.  2 ). Using this information as input for MAUS, we find that among all possible clustering/symmetrization alternatives resulting from peak overlap (2 40 -2 80 ), only a tractable number (2 10 -2 20 ) can lead to valid assignment solutions (Supplementary Table  3 ) and are further explored in exhaustive subgraph isomorphism ( H into G ) enumerations using the SAT algorithm (Fig.  1e ). Given the sparsity of experimental NOE data sets and our definition of a valid solution, each subgraph isomorphism instance has up to 2 100 valid solutions for a typical 100-methyl protein. However, the solution space is not uniformly distributed among methyl peaks; remarkably, results for all targets in our set show that the NOE network, residue type and stereospecificity constraints are sufficient to provide unambiguous assignments for a large fraction (64–89%) and low-ambiguity (two to three) options for the majority (11–30%) of remaining methyl resonances (Tables  1 and ​ and2, 2 , and Supplementary Fig.  3 ).

MAUS methyl resonance assignment statistics.

TargetMW (kDa)PDB IDOrigin/resolution (Å)Number of methylsLabeling schemeShort-range NOE distance% Unique assignments% Options ≤ 3 and >1
Hβ m 11.91JNJNMR35AILV6.58911
HLA-A0145.56AT9X-ray/2.987AILV7.26430
HLA-A0244.81DUZX-ray/1.894AILV6.07015
MBP40.71DMBX-ray/1.876MILV 6.07521
IL-215.41M47X-ray/1.961ILV6.57423
HNH 15.76O56NMR/1.953ILV8.08911
REC215.64CMPX-ray/2.669ILV6.55912
REC324.54ZT0X-ray/2.985ILV8.06728

a Non-stereospecific.

b Used symmetrization and clustering tolerances of 0.1 ppm for 13 C.

An external file that holds a picture, illustration, etc.
Object name is 41467_2021_20984_Fig2_HTML.jpg

Two separate protein samples are recommended for generating the standard inputs of MAUS. Sample 1 uses non-stereospecific methyl labeling with either (i) 13 C/ 1 H labeling only at the methyls of Met, Ala, Leu, Val, and Ile residues, on a 12 C/ 2 H background or (ii) 13 C/ 1 H labeling at the methyls and 13 C/ 2 H at the sidechains of methyl-bearing residues on a 12 C/ 2 H background, creating linearized spin systems 43 , which can be used to unambiguously distinguish Leu/Val methyl resonances 18 (Supplementary Figs.  4–6 ). When sample i is used, resonances of Leu/Val peaks can be partially distinguished using an automated chemical shift-based classifier within MAUS (“Methods”). Both samples can be used to define reference methyl chemical shifts using real-time (sample i) or constant-time (sample ii) 2D methyl-HMQC experiments. Long-range (up to 10 Å) NOEs are recorded using a 3D C M -C M H M SOFAST NOESY experiment (300 ms mixing time). A complementary NOESY experiment recorded with a short (typically 50 ms) mixing time is used to identify short-range NOEs, including between the geminal methyl resonances of Leu and Val residues. An additional protein sample 2 is prepared with stereospecific labeling (proS or proR) of Leu and Val methyls, which resolves spectral overlap in the 2D reference HMQC spectrum, and, together with sample 1 , allows unambiguous determination of geminal methyl pairs for Leu/Val. 13 C nuclei are displayed in red. Steps of NMR data analysis are colored green, with the resulting data set illustrated as a red rectangle. The standard input files for MAUS (blue) are the primary sequence, the PDB coordinate file or model structure, the 2D 1 H- 13 C HMQC peak list, and two 3D C M -C M H M SOFAST NOESY peak lists (recorded with short and long mixing time). MAUS also accepts partial assignments that could be included in the user-annotated 2D 1 H- 13 C HMQC list, together with a specification of allowed residue types, stereospecificity, and geminal methyl connectivities (if present) for each 2D reference peak. The output includes methyl assignment statistics, residue type classification, NOE assignments, and final lists of assignment options. Although MAUS does not consider the NOE peak intensities, this information can be evaluated by the user toward further reducing assignment ambiguities in the output lists.

Performance comparison of different methyl assignment programs.

Target (labeling scheme)No. of methylsMAUS all definedMAUS LV ambiguousMAGIC all definedMAGIC LV ambiguousFLAMEnGO 2.4MAP-XSIIMethylFLYA
Hβ m (AILV)3531/021/027/416/817/130/5ND
4/014/02/05/38/00/0
0/00/00/00/00/00/0
002390
0%0%12%34%4%14%
772914765
HLA-A01 (AILV)8756/048/058/450/1156/379/864/0
26/030/017/111/01/00/00/0
5/09/03/02/40/00/00/0
00492700
0%0%6%15%5%9%0%
15182964222312704
HLA-A02 (AILV)9466/045/057/5ND 64/378/1677/0
14/014/024/00/00/00/0
14/035/03/00/00/00/0
0052700
0%0%6%4%17%0%
171820331151024
MBP (MILV )7657/046/049/5ND ND 66/1052/0
16/020/010/10/00/0
3/010/05/00/00/0
00600
0%0%9%13%0%
192110017853
IL-2 (ILV)6145/06/023/23ND 21/436/2523/2
14/032/05/10/00/00/0
2/023/02/00/00/00/0
0073600
0%0%46%16%41%9%
5945130132553
HNH (ILV)5347/027/039/038/029/238/1544/0
6/018/012/23/24/00/00/0
0/08/00/06/40/00/00/0
00001800
0%0%4%5%6%28%0%
4450791512814
REC2 (ILV)6941/037/021/326/723/428/357/2
8/012/020/414/50/00/00/0
20/020/06/33/10/00/00/0
0012134260
0%0%15%23%15%56%29%
5587218357814
REC3 (ILV)8557/028/058/054/247/361/24
24/028/019/019/38/00/0ND
4/029/07/06/00/00/0
0011270
0%0%0%6%5%39%
1414581315116

ND not determined.

For results using a chemical shift-based classifier of Leu/Val resonances, see Supplementary Table 4 .

First row: number of methyls with one option correct/wrong.

Second row: number of methyls with 2–3 options correct/wrong.

Third row: number of methyls with >3 options correct/wrong.

Fourth row: number of unassigned methyls.

Fifth row: error rate (in %).

Sixth row: run time in minutes.

a MethylFLYA returned an error.

b The simulations were allowed to run for at least 96 h but did not complete.

Robustness of MAUS to missing experimental inputs

For optimal results, MAUS normally requires an NOE network with degree connectivity of 3.5 or greater (corresponding to an average of 1.75 experimentally observed NOE peaks per methyl). In order to unambiguously define the resonances of Leu and Val methyls and apply these as constraints to MAUS, we recommend the use of a homonuclear decoupling method 18 (Supplementary Fig.  4c, d ). However, to account for cases where peak residue type information is not available experimentally, we also tested MAUS on all targets by either (i) defining resonances in the Leu/Val region as either Leu or Val, or (ii) engaging a chemical shift-based classifier, within MAUS (see “Methods”). Removing Leu/Val residue type information has a significant impact on MAUS performance, resulting in an average of 27% decrease in assignment completeness across our benchmark set (Table  2 , columns 3 and 4). Here, the use of the MAUS classifier to distinguish between confident Leu and Val methyl resonances led to an improved average of 18% less uniquely assigned peaks. Nonetheless, the accuracy of all assignments remained 100% and the ground truth solution was valid across all targets (Table  2 and Supplementary Table  4 ). For example, for MBP, MAUS assigned 71% of resonances uniquely and an additional 17% with two or three options while maintaining 100% accuracy, despite the absence of experimental residue type information. In addition, if users cannot provide any annotations of the 2D reference peaks in the input, then MAUS inherently loses geminal connectivity constraints for all methyl resonances corresponding to Leu and Val residues, in addition to the residue type. It is therefore recommended, when using data sets where both geminal methyls are labeled, that MAUS is run using either missing geminal connectivity or Leu/Val residue type information, but not both.

Performance of MAUS using simulated NOE networks from PDB structures

Although results using our benchmark set of four targets with known assignments and recently recorded data highlight the utility of our method for proteins of sizes up to ~46 kDa, methyl-based NMR can be applied on much higher-molecular weight proteins. To test the size limit supported by MAUS, we expanded our simulation benchmark set to include 63 PDB structures with structure graphs containing more than 100 methyls from Ile, Leu and Val residues (corresponding to system sizes of up to 116 kDa) (Fig.  3 ). We chose to simulate assignment cases corresponding to the Isoleucine-Leucine-Valine (ILV), as opposed to the Alanine-Isoleucine-Leucine-Valine (AILV) labeling scheme, as a large fraction of the resonances corresponding to Ala methyls are missing from the spectra of larger proteins. For each target, a data graph was simulated by removing edges from the corresponding structure graph (defined as all methyl connectivities up to 10 Å present in the 3D structure) until a degree connectivity (defined as 2 × number of edges/number of nodes) of 3.5 or higher was reached. This value was chosen to represent experimental cases with 1.75–2.1 observed NOEs/methyl resonance, which are characteristic of real-life data sets. We find that, as long as this requirement is satisfied, MAUS can efficiently tackle the computational complexity of the graph-matching problem and provide meaningful assignments in a reasonable time (up to 4 h on a single central processing unit, CPU), even for larger targets. Specifically, our simulation results show that we can obtain a high coverage (60–80%) of methyl groups with 1–3 residue assignment options for targets up to 352 methyls. Moreover, MAUS can address proteins of up to 716 methyls/200 kDa (as exemplified by the Teneurin 2 partial extracellular domain, PDB ID: 6FB3), albeit in a longer time (11 h on a single CPU), which is still feasible from a computational standpoint.

An external file that holds a picture, illustration, etc.
Object name is 41467_2021_20984_Fig3_HTML.jpg

a Top: target size distribution (in terms of number of methyl groups from Ile, Leu, and Val residues) of 63 proteins with high-resolution PDB structures of sizes up to 116 kDa (352 methyls; PDB ID: 5WTI) (top). For each target, a data graph H was simulated by removing edges from the corresponding structure graph G (defined as all methyl connectivities up to 10 Å present in the 3D structure) until a degree connectivity (defined as 2 × number of edges/number of nodes) in the range of 3.5–4.2 was reached, corresponding to 1.75–2.1 simulated NOEs/methyl. Bottom: bar plot showing the distribution of degree connectivities among all simulated data graphs. b Scatter plot showing run time (in minutes) taken by MAUS to perform exhaustive enumeration of the possible methyl assignment options for each H into G mapping case. Colors indicate % of methyls with up to three assignment options, according to the scale on the right.

De novo resonance assignments of Cas9 nuclease domains

We further evaluated the performance of MAUS on deriving de novo, blind assignments for four representative targets; three domains of the Cas9 nuclease and the therapeutic cytokine interleukin-2 (IL-2) (15.4 kDa) (Fig.  4 and Supplementary Figs.  5–7 ). These comprise biomedically relevant proteins of mostly α-helical folds, leading to a high degree of spectral overlap, which challenges automated assignment methods. Cas9 is a 160 kDa RNA-guided endonuclease, which introduces DNA double-strand breaks upon site-specific recognition of a short nucleotide Protospacer Adjacent Motif, preceding the cleavage site 19 . The Cas9 enzyme comprises of a recognition lobe (REC) that forms an RNA:DNA hybrid through three subdomains (REC1–3), and a nuclease lobe with HNH and RuvC domains, which cleave the DNA strand that is complementary and non-complementary to the guide RNA, respectively (Fig.  4a, b ). Towards establishing methyl assignments, we designed optimized constructs of individual Cas9 domains, HNH (15.7 kDa), REC2 (15.6 kDa), and REC3 (24.5 kDa), showing well-dispersed methyl NMR spectra (Fig.  4c ). For these three domains, MAUS assigned 89%, 59%, and 67% of methyl resonances uniquely, and further provided assignment lists with two to three options for 11%, 12%, and 28%, respectively, while always maintaining 100% accuracy, relative to the ground truth assignments (Fig.  4d , Table  1 , and “Methods”). We applied a selective-decoupling experiment to distinguish Leu/Val resonances 18 (Supplementary Figs.  6 and 7 ) and used conventional methods to obtain reference backbone and methyl assignments for validating the MAUS results ( Supplementary Results ). Concurrently with the development of the present work, the backbone and side-chain assignments for the HNH domain were obtained using a conventional backbone-based approach and released in the Biological Magnetic Resonance Data Bank (BMRB, entry 27949) 20 , 21 . The MAUS-derived methyl assignments were in full agreement with the published results, which further highlights the practical utility of our method in saving machine time and manual effort spent.

An external file that holds a picture, illustration, etc.
Object name is 41467_2021_20984_Fig4_HTML.jpg

a Domain organization of Spy Cas9 composed of the recognition lobe (REC) and nuclease lobe (NUC). BH, bridge helix; PI, PAM-interacting. b Surface representation of SpyCas9 (PDB ID 4cmp) depicting the bilobed architecture. Protein domains are colored as in a . c Single domains used in the divide-and-conquer approach. 1 H- 13 C HMQC spectra of selectively labeled HNH (green), REC2 (red), and REC3 (blue) domains at Ile δ 1 - 13 CH 3 ; Leu, Val- 13 CH 3 / 13 CH 3 positions, acquired at 800 MHz, 25 °C. The spectra of the individual domains of Cas9 indicate that they retain their proper fold when in isolation. d Number of valid assignment options for each residue identified by MAUS for HNH, REC2, and REC3, respectively. The colored spheres represent final valid resonance assignment options: violet (1 option), green (2 options), yellow (3 options), and red (>3 options).

For IL-2, MAUS identified unique assignments for 74% and provided a short list (two to three) of confident options for an additional 23% of all methyl resonances (Table  1 and Supplementary Fig.  3 ). With the MAUS lists in hand, we manually considered the NOE peak intensities to reach complete (>95%) assignments for all targets in a matter of hours. Our results show that, together with the quality of the NMR sample, the protein fold and methyl chemical shift dispersion is equally important for obtaining complete assignments. Notably, the amide 15 N- 1 H TROSY spectrum of the Cas9 REC3 domain exhibits a significant fraction of missing backbone amide resonances (~55%, residues 660–712), likely due to conformational exchange-induced line broadening. Thus, methyl-based NMR supported by MAUS, is a practically useful tool to study not only larger systems with intractable backbone amide spectra, but also enabling routine applications for the study of medium-sized targets.

Comparison with previous methods

When compared with other methyl assignment tools (MAGIC/FLAMEnGO2.4/MAP-XSII) using the same inputs for all targets with known assignments in our benchmark set, MAUS produces 100% accurate results, while the previous methods show error rates of up to (34/16/56%), and significantly lower assignment completeness rates (Table  2 and Supplementary Table  5 ). Consistently with the results obtained for our benchmark set, MAUS maintains an improved performance on the 4 blind targets in our set (IL-2, REC2, HNH, and REC3), in terms of both unambiguous assignments and error rate (Table  2 ).

A recent commercially available method, MethylFLYA 12 , outperforms the existing publicly available methyl assignment tools. We carried out a detailed comparison of MAUS (also readily available to users via an online interface) with MethylFLYA on the eight targets used in our study, by providing the identical inputs to both methods. Upon comparison of the number of resonances assigned uniquely by MAUS and with high confidence by MethylFLYA, we find that MAUS outperforms MethyFLYA for 6/8 targets in our set (Hβ 2 m, MBP, IL-2, HNH, REC2, and REC3). Specifically, (i) for IL-2 and REC2, MethylFLYA reports a significantly lower number of confident assignments (38% and 10%) relative to MAUS (74% and 60%), while also producing two incorrect assignments for each target, (ii) for HNH and MBP, it produces confident assignments at lower completeness levels, and (iii) for Hβ 2 m and REC3, MethylFLYA crashes (segmentation fault), producing no results. Although MethylFLYA assigns a higher fraction of methyl resonances for the two larger targets in our set (HLA-A01 and HLA-A02), it requires a prohibitively longer computation time on a single CPU and, in general, it is less efficient than MAUS by two to three orders of magnitude (Table  2 ). Finally, although MethylFLYA provides low-confidence options for some of the unassigned methyl resonances, MAUS provides a list of valid options for each 2D reference peak, which contains the correct assignment, owning to the fact that it applies a satisfiability-based enumeration process that is both exact, and exhaustive.

Consistent methyl assignments guided by sequence-based models

The structures of targets which lack a representative template in the PDB can be modeled using homology-based methods 22 or, in cases with no identifiable sequence homologs in the PDB, sequence co-evolution approaches 23 , 24 . To examine the latter, more challenging case, we used trRosetta (transform-restrained Rosetta) 25 , which has demonstrated high accuracy for complex Critical Assessment of Structure Prediction 26 targets. We derived a set of models for a representative case in our benchmark set, MBP using the trRosetta web-server, and used this together with our experimental NOE data recorded on a MILV proS sample (Supplementary Table  2 ) to run MAUS. The models showed a 3.3 Å average all-atom root-mean-square deviation (RMSD) from the X-ray structure, nonetheless, MAUS still provided unique and low-ambiguity (two to three options) assignments for 68% of resonances (relative to 96% using the X-ray structure), while maintaining 100% accuracy (Fig.  5a ).

An external file that holds a picture, illustration, etc.
Object name is 41467_2021_20984_Fig5_HTML.jpg

a Top: flowchart showing the combined de novo structure modeling and methyl resonance assignment strategy using trRosetta (transform-restrained Rosetta) and MAUS, respectively. (bottom) Overlay of trRosetta model (wheat) and X-ray structure (PDB ID: 1DMB; pale cyan) of maltose-binding protein. Number of resonance assignment options identified by MAUS are indicated by colored spheres (violet: 1, green: 2, yellow: 3 and red: >3). b Top: structures of a designed homotrimeric protein 27 , in the open (solved by X-ray crystallography) and closed (solved by solution NMR) states showing inter-methyl distances corresponding to experimental NOEs identified in a 3D C M -C M H M NOESY spectrum, assigned manually 27 . Gray spheres show methyl residues participating in the NOE network. The dashed lines represent pairwise distances in the two models, also shown as bar plots (bottom) for the open (gray) and closed (golden) states. The dashed line in the bar plot highlights the 10 Å NOE upper distance limit used by MAUS. Using the unassigned NOE peaks, MAUS delivers a set of satisfying assignments when provided with a model of the closed, but not of the open structure as input.

As structure prediction methods may also lead to globally incorrect models, we examined whether MAUS can distinguish between two closely related structures of the same protein sequence directly from unprocessed 3D NOE peaks (picked directly from 3D NOESY data using a signal-to-noise (S/N) ratio ≥ 5). To test this, we used a designed homotrimeric protein (XAA) consisting of a multi-layer, three-helix bundle with modular termini that may adopt two divergent structures corresponding to an open (observed by X-ray) and a closed state, which corresponds to the solution structure determined using manual NOE assignments by our group 27 . Starting from the unassigned NOE peaks, we ran MAUS using both models as inputs including both intra- and inter-chain connectivities in the structure graph G , and found that only the closed state led to assignments which satisfy the global NOE network within our default 10 Å upper limit. MAUS finds that the data graph is not isomorphic to the structure graph derived from the open state, due to a subset of NOE constraints that are violated by more than 25 Å (Fig.  5b ). This result justifies our choice of an exact subgraph isomorphism approach, since any attempt to arbitrarily remove NOEs (i.e., edges of H ) not explained by the input structure (i.e., edges of G ) could lead to a cascade effect, resulting to global changes in the space of solutions.

In summary, we demonstrate that a satisfiability-based approach can deliver reliable assignments for a range of targets amenable to solution-state NMR, using unprocessed NOE peak data (3D or 4D peak lists) and an existing structural model with the correct overall fold. Our results using NOESY spectra recorded for proteins with known assignments as well as for several blind targets show that, unlike previous methods, MAUS provides 100% accurate solutions for a large fraction (60–80%) of methyl groups, thereby reducing manual effort, costs, and errors introduced due to manual pre-processing and validation of the data. We further demonstrate that, in the absence of high-resolution structures or structural homologs in the PDB, sequence co-evolution-based models can be used by MAUS, without compromising the correctness of produced assignments. Alternatively, for larger, multi-domain proteins with complex methyl spectra, a user may apply a divide-and-conquer strategy, supported by MAUS. Using our online web-server, users can now assign the methyl spectra of large, multi-domain proteins in a matter of a few days, also considering the time it takes to record the NMR data. Our method opens new possibilities for studying challenging, complex molecular machines, as illustrated here for the Cas9 nuclease.

Protein expression and purification of benchmark targets

DNA corresponding to the luminal domains of Hβ 2 m, HLA-A*01:01, and HLA-A*02:01 expressed in Escherichia coli BL21 (DE3) cells, refolded and purified as described 27 , 28 . Briefly, Hβ 2 m, HLA-A*01:01, and HLA-A*02:01 induced with 1 mM isopropyl β- d -1-thiogalactopyranoside (IPTG) at an OD600 of 0.7 at 37 °C for 4 h, isolated from inclusion bodies and refolded in vitro. Inclusion bodies were isolated from E. coli cell pellets by sonication, following by a wash with 100 mM Tris pH 8, 2 mM EDTA, 0.1% (v/v) deoxycholate, and solubilization in 5.5 M guanidine hydrochloride under reducing conditions. For in vitro refolding of Hβ 2 m, 100 mg protein was slowly diluted dropwise over 24 h into refolding buffer (0.4 M arginine-HCl, 2 mM EDTA, 4.9 mM reduced l -glutathione, 0.57 mM oxidized l -glutathione, 100 mM Tris pH 8.0) at 4 °C while stirring. For in vitro refolding of HLA-A*01:01 and HLA-A*02:01, 10 mg of peptide (ILDTAGKEEY for HLA-A*01:01 and LLFGYPVYV for HLA-A*02:01), and 100 mg heavy chain and 100 mg Hβ 2 m were slowly diluted dropwise over 24 h into refolding buffer (0.4 M arginine-HCl, 2 mM EDTA, 4.9 mM reduced l -glutathione, 0.57 mM oxidized l -glutathione, 100 mM Tris pH 8.0) at 4 °C while stirring. All refolding proceeded for 4 days at 4 °C without stirring. Purification of Hβ 2 m, HLA-A*01:01, and HLA-A*02:01 was performed by size-exclusion chromatography (SEC) with a HiLoad 16/600 Superdex 75 pg column at 1 mL/min with running buffer (150 mM NaCl, 25 mM Tris pH 8).

XAA was expressed in E. coli BL21 Star (DE3) cells and purified as described 26 . Briefly, XAA was induced with 0.1 mM IPTG at an OD600 of 0.6 and expressed at 18 °C for 17 h. Cells were resuspended in lysis buffer (25 mM Tris pH 8, 300 mM NaCl, 20 mM imidazole) and lysed by sonication. Supernatant was applied to nickel-nitrilotriacetic acid (Ni-NTA) resin pre-equilibrated with lysis buffer. The column was rinsed with ten column volumes of wash buffer (25 mM Tris pH 8, 300 mM NaCl, 20 mM imidazole). XAA was eluted using two column volumes elution buffer (25 mM Tris pH 8, 300 mM NaCl, 250 mM imidazole). The hexahistidine tag was removed via thrombin cleavage by incubating with 1 : 5000 thrombin (EMD, Millipore) for 15 h at 25 °C. Cleaved XAA was purified by SEC using a Superdex 75 10/300 GL column (GE Healthcare) at 1 mL/min with running buffer (150 mM NaCl, 25 mM Tris pH 8).

The MBP-cyclodextrin complex was prepared as described previously 29 . Briefly, MBP was expressed in E. coli strain BL21(DE3) cells, induced with 1 mM IPTG at an OD600 of 0.7 and expressed at 37 °C for 5 h. Purification of MBP-cyclodextrin was achieved as follows. First, MBP was bound to an amylose affinity column in the presence of binding buffer (150 mM NaCl, 25 mM Tris pH 8, 1 mM EDTA) and then eluted using eluted buffer (150 mM NaCl, 25 mM Tris pH 8, 1 mM EDTA, 10 mM maltose). Second, MBP was further purified by SEC with a HiLoad 16/600 Superdex 75 pg column at 1 mL/min with running buffer (150 mM NaCl, 25 mM Tris pH 8). MBP was then partially unfolded for 3 h at 25 °C in 150 mM NaCl, 25 mM Tris pH 8, 2.5 M guanidinium hydrochloride, and refolded by dilution into GuHCl-free buffer containing 2 mM β-cyclodextrin. Following purification all proteins were exhaustively buffer exchanged in their respective buffers (see section “Stereospecific isotopic labeling”).

Streptococcus pyogenes Cas9 domain constructs

The coding sequences for Cas9 HNH (residues 776–908) and REC2 domains (residues 167–307) were synthesized, codon optimized for expression in E. coli , and subcloned into a His 10 -MBP expression vector (Genscript) (Supplementary Table  6 ). pSHS325 bacterial expression plasmid for SpCas9 REC3 domain was a gift from Jennifer Doudna & Keith Joung (Addgene plasmid #101205) 30 . Each protein was expressed in E. coli BL21 (DE3) containing chaperone plasmid pG-KJE8 (TAKARA, 3340) to enhance protein folding 31 and purified as described below 32 . Briefly, when cells reached an OD600 of ~0.6, IPTG was added to a final concentration of 0.5 mM to induce protein expression. Cells were then grown for an additional 18 h at 23 °C. Collected cells were resuspended in lysis buffer (50 mM Tris pH 7.5, 500 mM NaCl, 5% (v/v) glycerol, and 1 mM Tris(2-carboxy-ethyl) phosphine (TCEP) containing an EDTA-free protease inhibitor tablet (Roche). The cell suspension was sonicated on ice and clarified by centrifugation at 27,000 ×  g for 15 min. The soluble lysate fraction was bound in batch to Ni-NTA agarose (Qiagen). The resin was washed extensively with 20 mM Tris pH 7.5, 500 mM NaCl, 1 mM TCEP, 10 mM imidazole, and 5% (vol/vol) glycerol, and the bound protein was eluted in 20 mM Tris pH 7.5, 500 mM NaCl, 1 mM TCEP, 300 mM imidazole, and 5% (vol/vol) glycerol. The His 10 -MBP affinity tag was removed with His 10 -tagged TEV protease during overnight dialysis against 20 mM Tris pH 7.5, 500 mM NaCl, 1 mM TCEP, and 5% (vol/vol) glycerol. The protein was then flowed over Ni-NTA agarose to remove TEV (Tobacco Etch Virus) protease and the cleaved affinity tag, and further purified by SEC on a Superdex 200 16/60 column (GE Healthcare) in 20 mM Tris pH 7.5, 200 mM KCl, 1 mM TCEP, and 5% (vol/vol) glycerol.

Human IL-2 expression, refolding, and purification

Codon optimized DNA encoding the human IL-2 (amino acids 1–133) with a site-specific mutation (C140S) (Supplementary Table  6 ) was expressed in BL21 (DE3) E. coli cells as inclusion bodies. Protein expression was achieved by induction with 1 mM IPTG at an OD600 of 0.6 followed by cell growth at 37 °C for 5 h at 200 r.p.m. For in vitro refolding, ~30 mg of inclusion bodies was dropped diluted into 200 mL of refolding buffer (1.1 M guanidine, 6.5 mM cysteamine, 0.65 mM cystamine, 110 mM Tris pH 8.0) at 4 °C while stirring. Refolding proceeded overnight at 4 °C without stirring. The solution was dialyzed into a buffer of 20 mM MES pH 6.0, 25 mM sodium chloride. Purification of refolded IL-2 was performed by cation exchange chromatography with a CAPTO-SP column using a 25 mM to 1 M NaCl gradient in a buffer with 25 mM MES pH 6.0 followed by SEC with a Superdex 75 column (GE) at 0.5 mL/min with running buffer (50 mM NaCl, 20 mM sodium phosphate pH 6.0). Protein concentrations were determined using A 280 measurements on a NanoDrop with extinction coefficients estimated with the ExPASy ProtParam tool.

Preparation of NMR samples, backbone, and methyl assignments

All proteins were overexpressed in M9 medium in 2 H 2 O containing 2 g l −1 2 H 13 C glucose (Sigma #552151) and 1 g l −1 15 NH 4 Cl. Selective methyl labeling, referred to as ILV*, was achieved by the addition of appropriate precursors (ISOTEC Stable Isotope Products (Sigma-Aldrich) as detailed previously 28 , 33 . ILV-methyl (Ile 13 Cδ 1 ; Leu 13 Cδ 1 / 13 Cδ 2 ; Val 13 Cγ 1 / 13 Cγ 2 ) U-[ 15 N, 2 H]-labeled proteins were prepared in M9 medium in 2 H 2 O, supplemented with 2 g l −1 2 H 12 C glucose (Sigma #552003) and 1 g l −1 15 NH 4 Cl. De novo stereospecific methyl assignments were achieved by utilizing three independently prepared isotopically labeled samples: AILV or ILV (Ala 13 Cβ; Ile 13 Cδ 1 ; Leu 13 Cδ 1 / 13 Cδ 2 ; Val 13 Cγ 1 / 13 Cγ 2 in an otherwise U-[ 15 N, 12 C, 2 H] background), ILV proS (Ile 13 Cδ 1 ; Leu 13 Cδ 2 ; Val 13 Cγ 2 in an otherwise U-[ 15 N, 12 C, 2 H] background), and ILV* (Ile 13 Cδ 1 only; Leu 13 Cδ 1 / 13 Cδ 2 ; Val 13 Cγ 1 / 13 Cγ 2 in an otherwise U-[ 15 N, 13 C, 2 H] background). We employed a multipronged approach where backbone assignments are used to aid the assignment of side-chain methyl groups. Specifically, we first obtained backbone assignments using TROSY-based 3D HNCA, HN(CA)CB, and HNCO experiments recorded with the ILV*-labeled samples. Acquisition times of 30 ms ( 15 N), 14 ms ( 13 CO), and 10/5 ms ( 13 Cα / 13 Cβ) in the indirect dimension were used. Backbone amide assignments were confirmed using amide-to-amide NOEs obtained from 3D H M -NH N and 3D N-NH N SOFAST NOESY-HMQC experiments 7 . Final backbone assignments were further validated using TALOS-N. Next, Ala, Ile, Leu, and Val methyl assignments were achieved using methyl-to-methyl NOEs observed in 3D H M -C M H M and 3D C M -C M H M SOFAST NOESY-HMQC in addition to methyl-to-amide NOEs observed in 3D H N -C M H M and 3D C M -NH N SOFAST NOESY-HMQC experiments 7 recorded on the AILV- or ILV-labeled samples. For the 3D H N -C M H M SOFAST NOESY, the acquisition parameters were 64, 48, and 1280 complex points in the 1 H N , 13 C M , and 1 H M dimensions with corresponding acquisition times of 14.5, 10.8, and 79 ms with eight scans/FID. For the 3D N-C M H M SOFAST NOESY, the acquisition parameters were 48, 40, and 1280 complex points in the 15 N, 13 C M , and 1 H M dimensions with corresponding acquisition times of 26, 9, and 79 ms with eight scans/FID (free induction decay), respectively.

Leu/Val geminal pairs were determined by comparing NOE strips in 3D C M -C M H M SOFAST NOESY experiments recorded using short (50 ms) and long (300 ms) mixing times. The acquisition parameters for the 3D H M -C M H M SOFAST NOESY and 3D C M -C M H M SOFAST NOESY are provided in Supplementary Table  2 . Ile δ 1 methyl types were identified by their characteristic upfield chemical shifts. Leu and Val methyl types were identified using phase-sensitive 2D 1 H- 13 C HMQC experiments recorded on the ILV*-labeled sample 18 . Alanine methyl types were identified by comparing 2D 1 H- 13 C HMQC spectra of AILV and ILV proS samples, and by comparing Ala C β chemical shifts from HN(CA)CB experiments. Finally, stereospecific Leu δ 2 and Val γ 2 methyl assignments were obtained by comparison of 2D 1 H- 13 C HMQC spectra of AILV- and ILV proS -labeled samples. All the data were recorded at a 1 H field of 750 or 800 MHz at 25 °C. The 3D SOFAST NOESY experiments were recorded at 800 MHz, 25 °C using a recycle delay of 0.2 s, and NOE mixing time of 50 ms and 300 ms. Typical data acquisition times were 1 h for the 2D HMQC, 8–10 h for the 50 ms, and 12–14 h for the 200 ms 3D SOFAST NOESY-HMQC experiments. All spectra were acquired using Topspin 3 acquisition software from Bruker. The data were processed in NMRPipe 34 and analyzed in NMRFAM-SPARKY 35 and CcpNMR 36 programs.

Stereospecific isotopic labeling

A specifically methyl-labeled acetolactate precursor (2-[ 13 CH 3 ], 4-[ 2 H 3 ] acetolactate) was obtained through deprotection and exchange of the protons of the methyl group in position four of ethyl 2-hydroxy-2-( 13 C)methyl-3-oxobutanoate (FB reagents) achieved in D 2 O at pH 13 37 . Typically, 300 mg of ethyl 2-hydroxy-2-( 13 C)methyl-3-oxobutanoate was added to 24 mL of a 0.1 M NaOD/D 2 O solution. After 30 min, the solution was adjusted to neutral pH with DCl and 2 mL of 1 M Tris pH 8.0 in D 2 O was added. For the production of highly deuterated [U- 2 H], I-[ 13 CH 3 ]δ 1 , L-[ 13 CH 3 ]proS, V-[ 13 CH 3 ]proS samples, 300 mg/L of 2-[ 13 CH 3 ], 4-[ 2 H 3 ] acetolactate, prepared as described above, was added 1 h prior to induction (OD600 ≈ 0.55). Forty minutes later (i.e., 20 min prior to induction), 3,3-[ 2 H 2 ],4-[ 13 C]-2-ketobutyrate (SIGMA #589276) was added to a final concentration of 60 mg/L. Each protein was induced as described above.

For each labeled protein sample, the concentration and buffer composition was as follows:

- 0.5 mM HLA-A01 in 20 mM sodium phosphate (pH 7.2), 50 mM NaCl, 5% D 2 O.

- 0.5 mM HLA-A02 in 20 mM sodium phosphate (pH 7.2), 50 mM NaCl, 5% D 2 O.

- 1.3 mM Hβ 2 m in 20 mM sodium phosphate (pH 7.2), 50 mM NaCl, 5% D 2 O.

- 0.5 mM XAA domain in 20 mM sodium phosphate (pH 6.2), 100 mM NaCl, 5% D 2 O.

- 0.8 mM MBP domain in 20 mM sodium phosphate (pH 7.2), 50 mM NaCl, 5% D 2 O.

- 2.0 mM HNH domain in 20 mM HEPES (pH 7.5), 80 mM KCl, 5% D 2 O.

- 0.5 mM REC2 domain in 20 mM sodium phosphate (pH 7.2), 50 mM NaCl, 5% D 2 O.

- 0.7 mM REC3 domain in 20 mM HEPES (pH 7.5), 200 mM KCl, 5% deuterated glycerol-d8, 1 mM TCEP, 0.01% NaN 3 , 5% D 2 O.

- 0.4 mM IL-2 in 20 mM sodium phosphate (pH 6.0), 50 mM NaCl, 5% D 2 O.

Generating a structure graph G in MAUS system

The input PDB structure is utilized by MAUS to construct an undirected graph G (Fig.  1b ). First, for all pairs of methyls in the input structure, we compute the following function of the average distance between all pairs of protons from the two methyl groups (Eq. ( 1 )):

Where p and q are methyl groups and p i  and q j represent protons of each methyl group in an input structure s , and d denotes the Euclidean distance. When calculating an effective distance between the nine pairs of methyl protons, the average value of the distances (i.e., multiplicity correction) used here results in a minor increase in upper distance bounds relative to more commonly used r −6 summation. For instance, if we have three distance measures 4 Å, 5.5 Å, and 7 Å, the regular r −6 summation results in a value of 3.9 Å, whereas the average value is 4.7 Å, which remains within the range of the observed interproton distances. Due to the fact that MAUS explicitly considers different side-chain rotamers in addition to applying relatively large upper distance thresholds, the exact choice of distance averaging does not influence our results significantly (as opposed to a more precise estimate of upper bounds that is required for NMR structure determination applications).

To account for alternative side-chain rotamers, the input structure is subjected to n (quick: 10 or thorough: 100) independent relaxations using the FastRelax 38 protocol in Rosetta. Therefore, the element of the adjacency matrix Δ( p ,  q ) is defined by taking the minimum of the distance functions observed among all sampled conformations (Eq. ( 2 )):

We call an edge of the structure graph, G (i) geminal, if it corresponds to a connectivity between the γ 1 / γ 2 and δ 1 / δ 2 methyls of Leu and Val residues, (ii) long-range if 6 Å < Δ( p ,  q ) ≤ 10 Å, and (iii) short-range if Δ( p ,  q ) ≤ 6 Å.

Deriving all possible data graphs, H , from the input NOE data

The two 3D NOESY spectra (recorded with short and long mixing times) are picked at a typical S/N level of 5 or higher and provided as an input list of (C1, C2, H2) coordinates. In addition, a high-resolution 2D reference methyl-HMQC spectrum is picked to identify all reference (C, H) methyl resonances. For larger (>20 kDa) protein targets, simultaneous consideration of the 2D spectrum from a separate, stereospecifically labeled proS sample helps resolve overlap, to identify the exact 13 C and 1 H chemical shifts corresponding to each observable methyl in the NMR sample. The 2D reference and two 3D NOESY peak lists are provided as inputs to MAUS.

We model all alternative NOE connectivities resulting from overlap in the input 2D and 3D spectra, to allow an explicit consideration of all possible data graphs, H , which are consistent with the input data. First, a 3D NOESY cross-peak can be projected to one or several 2D reference peaks (Fig.  1c ). In particular, a 3D maximum (C1, C2, H2) can be projected to a 2D maximum (C, H) if and only if the following holds true (Eqs. ( 3 ) and ( 4 )):

The tolerance values are set, by default, to 0.15 p.p.m. for 13 C and 0.02 p.p.m. for 1 H chemical shifts. NOE peaks with no identifiable projection, as well as diagonal NOEs are eliminated. This process yields clusters of tentative NOE connectivities for each peak in the 2D reference set.

Second, each valid (true positive) connectivity present in the input data must arise from one upper-diagonal NOE cross-peak with one corresponding, symmetry-related lower-diagonal counterpart. However, identifying unique symmetry relations between pairs of 3D NOE cross-peaks can be challenging due to spectral overlap. Specifically, two 3D NOE peaks (C1, C2, H2) and (C1′, C2′, H2′) are potentially symmetric (Eqs. ( 5 ) and ( 6 )) if and only if

To explore all possible symmetry relations present in the input data, we construct a bipartite symmetrization graph connecting upper-diagonal to lower-diagonal NOE cross-peaks (Fig.  1d ). An edge appears between all pairs of peaks that are potentially symmetric. For typical NOE data sets, the bipartite symmetrization graph consists of several connected components (Fig.  1d ). Isolated peaks are eliminated. Components of sizes 2 and 3, termed simple, provide edges of the data graph, H . Larger, complex components are iteratively reduced to smaller ones using explicit satisfiability, eliminating edges that do not yield satisfying assignments in exhaustive enumerations, as outlined in detail below (Fig.  1d ).

Reducing subgraph isomorphism to satisfiability

A boolean function ƒ on a set of boolean variables x = { x 1 , x 2 , x 3 . . . , x n } is said to be satisfiable if there exists an assignment x ∈ { 0 , 1 } n such that ƒ ( x ) evaluates to 1. The question of whether a function f is satisfiable is known as the satisfiability (SAT) problem in computer science. A boolean function ƒ is in conjunctive normal form (CNF) if it is a conjunction of one or more clauses, where a clause is a disjunction of literals; in other words, it is an AND of ORs. Although satisfiability is NP-complete, i.e., theory suggests that no polynomial-time algorithm exists, in practice, efficient satisfiability solvers have been developed, solving formulas with millions of variables and clauses from milliseconds to a few seconds 39 .

The resonance assignment problem is a subgraph isomorphism problem, which is also NP-complete 13 . MAUS reduces resonance assignment to a boolean satisfiability problem, exploiting the power of satisfiability solvers to explore the space of all possible assignment solutions exhaustively. Specifically, given an instance of a resonance assignment problem, we construct a boolean formula introducing the following variables:

  • X ( i, j ): a boolean variable declaring whether 2D peak i is assigned to methyl j in the input structure.
  • Y ( k , l ): a boolean variable declaring whether NOE peak k is clustered to 2D peak l .
  • Z ( k , k ′): a boolean variable declaring whether NOE peaks k and k ′ are truly symmetric.

Next, MAUS combines these variables into a CNF formula imposing the following hard constraints:

  • For every 2D peak, exactly one X ( i , j ) is true.
  • For every methyl in the structure, at most one X ( i , j ) is true.
  • For every NOE, exactly one Y ( k , l ) is true.
  • If Z ( k , k ′) is true, then in the symmetrization graph k and k ' must be matched in a maximum matching (MM; outlined below).
  • If Z ( k , k ′) is true, then the methyl assignments of the 2D projections of NOE peaks k and k ′ must be connected by an appropriate edge in the structure graph, G .

In summary, MAUS casts the methyl assignment subgraph isomorphism problem as a system of Boolean constraints captured by a CNF formula and uses CryptoMiniSAT 39 , a general-purpose solver that provides either (i) a single graph-matching solution at a millisecond timescale on a single CPU or (ii) a mathematical proof that the formula is unsatisfiable. As there can be multiple mappings consistent with the input constraints, MAUS utilizes the solver recursively to enumerate all valid solutions (see below). In particular, MAUS truncates the solution space of the formula, imposing further restrictions on the range of a node, and then recursively checking for novel solutions in the restricted formula. Due to the fact that it uses a custom-built CNF formula, MAUS has the flexibility to add or remove constraints from the formula and perform 1000s of calculations of the subgraph isomorphism problem in a feasible time frame.

Using these two principles of (1) employing a custom-based formula encoding our system of hard constraints and (2) leveraging a fast and efficient solver, MAUS can perform an exhaustive enumeration of the space of solutions, despite the NP-completeness of the problem, using the following iterative ansatz:

  • (i) Obtain a single valid mapping of peaks into methyls (validity is imposed by the hard constraints) from the solver.
  • (ii) Starting from this valid mapping, consider an arbitrary peak (p1) that has been assigned to a particular methyl (m1) in (i) and add a temporary clause to the propositional CNF formula that p1 cannot be assigned to m1.
  • (iii) Invoke the satisfiability solver a second time to see if it can return an alternative solution for that peak. If it can, then the CNF formula is modified again to include the new solution to the list of excluded assignments and this process is repeated until the solver reaches unsatisfiability (i.e., there are no more assignments of p1 that can lead to a mapping, which satisfies the CNF formula).
  • (iv) Add hard constraints imposing that p1 can only be assigned to exactly one of the valid options and repeat steps (i) through (iii) to enumerate the assignment options for a different peak, until all peaks have been considered.
  • (v) Repeat step (iv) until all peaks have been considered.

Through this iterative process, MAUS achieves full enumeration of the peak support sets (containing all valid methyl assignments) in a time that is proportional to the sum of the support set sizes, without resolving to the use of heuristics. If no solutions exist, MAUS must return a proof that all possible assignments have been considered. This method is akin to the celebrated Davis–Putnam–Logemann–Loveland procedure for the satisfiability problem 40 , 41 .

Reducing complex components of the symmetrization graph

MAUS reduces the ambiguity arising from spectral overlap in the NOESY data, which is formally described in the form of a bipartite symmetrization graph, allowing it to consider all possible NOE connectivities that can be derived from the raw NOE peaks without any manual effort by the user (Fig.  1d ). Simple components of the graph, containing up to two edges, directly impose satisfiability constraints. For each complex component (i.e., a set of nodes containing at least three edges), MAUS first generates all possible MMs using a O(N 3 ) algorithm efficiently 42 . Every MM is iteratively examined, introducing additional clauses to the CNF formula and running the satisfiability solver. If MAUS encounters an edge, which causes the formula to become unsatisfiable, then this edge is eliminated from all MMs, repeating until no further edges can be eliminated. Thus, the bipartite graph-matching process is iterated many times as an “outer loop”, providing possible constraints to the CNF solver, which serves to eliminate some of the possible matchings. This allows us to identify a maximal, self-consistent set of upper/lower-diagonal NOEs, which is used to provide the final constraints to the solver, completely removing this burden from the user. During this process, complex components decompose into simple components, recovering additional NOE connectivities into H (Supplementary Table  7 ). The small fraction of NOE peaks, which remain within complex components, are not used by MAUS.

Chemical shift-based residue type classifier

To predict the residue type of a given 2D reference peak when this information is not available from additional experiments, we have analyzed assigned chemical shift data in the BMRB. In particular, we have derived correlated 13 C, 1 H chemical shift distributions for each methyl atom belonging to Met, Ala, Leu, Val, and Ile, using all assigned resonances in the database. We then constructed a table associating 13 C and 1 H pairs with the frequency of each residue type, ranked in decreasing order. A 2D peak can be tentatively assigned to a minimum set of residues, such that their cumulative frequency is ≥99%. Using our benchmark targets, we have evaluated the performance of the classifier for Leu and Val residues, as the resonances of all other residues can be unambiguously determined from their position on two 2D spectra recorded using complementary (MA)ILV and ILV proS samples (Supplementary Table  4 ).

Inputs to MAUS

MAUS accepts as inputs, a user-annotated 2D HMQC list, two 3D NOESY lists recorded using long (300 ms) and short (50 ms) mixing times or a 4D NOESY list recorded using long mixing time (300 ms) and a PDB file. In this work, the experimental peak lists were picked manually using a S/N threshold of 5 and guided by the 2D reference spectrum. When two different users in our group picked the same data sets, they arrived at exactly the same assignment result using MAUS, suggesting that this approach is robust to any biases introduced by the user. This simple process allows us to construct NOE peak tables for each target in under 1 h.

The user can input a crystal structure, an NMR ensemble or an ensemble of models computed using structure prediction methods. MAUS also supports oligomeric systems with regular symmetry, by considering subunit interfaces between different chains in the input PDB file, as specified by the user. Each 2D HMQC peak in the input must be provided in a custom format which specifies residue type(s), a unique number representing the peak, geminal connectivity and stereospecificity information (for peaks corresponding to Leu and Val). To ensure consistency, the user must provide the NMR construct sequence (used to confirm that all methyls present in the NMR samples are also present in the input PDB file) and the labeling scheme used to prepare the NMR sample. The choice of methyl labeling scheme is target-specific and should be optimized such as to achieve the maximum number of probes, while retaining a well-resolved 2D methyl spectrum. Although our manuscript focuses on the most commonly used ILV scheme, MAUS also supports the methyl-bearing side chains of Ala and Met. The user may instruct MAUS to use custom radii (upper limits) for short and long mixing time NOEs (see Table  1 and “Methods” for parameters used for our benchmark targets). The radii must be chosen within recommended ranges (6–8 Å for short-range and 10–12 Å for long-range NOEs). In practice, the smallest possible distance threshold that defines a structure graph that is subgraph isomorphic to the data graph is the optimal distance threshold to run MAUS. A rational approach for choosing this distance is outlined here as follows: (i) we first run MAUS using a standard 8 Å distance threshold for the 50 ms NOESY data, which we assume to be the maximum possible 50 ms NOE distance. This assumption holds for all targets used in this study, also considering that the structure graph already takes into account alternative side-chain rotamers and other local variations of the structure. (ii) We then reduce the threshold by steps of 0.1 until MAUS can no longer fit the NOE network into the structure graph. (iii) The final distance threshold is either the minimum distance identified in ii augmented by 0.5 or 8 Å (we select the smaller value). We find that, for all targets, a threshold in the range 6–8 Å gives optimum results, while always retaining the ground truth within the solution range. This strategy for finding the optimal threshold is carried out manually using a series of short preliminary MAUS runs on our online web-server (in cases where MAUS cannot fit the NOE network, a descriptive message is returned to the user immediately). Running all targets using a conservative 8 Å threshold for the 50 ms NOE data still provides valid and meaningful results, with assignment completeness levels (measured in terms of 1–3 assignment options) that decrease by 3% for HLA-A02, 6% for MBP, 40% for IL-2, and 26% for REC2 targets in our benchmark set, relative to the user-optimized thresholds (Supplementary Table  8 ). MAUS also employs tunable tolerance parameters, for diagonal NOEs, clustering, and symmetrization, which are set to default values mentioned in previous sections. Detailed instructions for submitting jobs and creating input files in the required format for MAUS are outlined on the help page of our website.

Inconsistencies in peak residue-type information or geminal methyl connectivities in the inputs provided by the user (such as misidentification of a Leu peak as a Val or misidentification of geminal methyls) will, in most cases, lead to an unsatisfiable system of constraints due to inconsistency with the global NOE network. To avoid this and guide the user towards obtaining confident assignments, in the MAUS input page we recommend that the user should only provide residue type and geminal information if they have 100% confidence and are guided by conclusive evidence from a range of additional experimental inputs (Fig.  2 ) (residue type and proS-selective labeling, 2D experiments that achieve selective inversion of Leu peaks further assisted by the observation of strong geminal NOEs). We find that, when provided with this information, determination of geminal methyls and residue types can be achieved with 100% confidence by the user as shown by our results on eight benchmark and blind targets. Alternatively, if the user is not confident in the residue type and geminal connectivities, our online interface fully supports a mixed input, where for some methyls this information is either not given or a list of options is provided (e.g., Leu or Val). Finally, the user also has the option to engage the MAUS residue type and geminal methyl classifier, which provides 99% accurate results. Together, these options effectively address any possible inconsistencies in the residue type/geminal methyl user inputs.

Inconsistencies in the input NOE peak lists, due to spectral artifacts, errors in peak picking, which may also include incorrectly identified peak centers. To address this type of data discrepancy, we recommend that users pick 3D or 4D NOESY data at sufficiently high S/N > 5 and remove spectral artifacts manually. However, the most critical form of proactively removing inconsistent data points is provided by our rigorous clustering and symmetrization procedures, which requires that every valid NOE constraint: (i) corresponds to a single methyl group on the 2D 13 C/ 1 H reference HMQC spectrum, through the explicit consideration of all possible clustering scenarios, and (ii) arises from two exclusively symmetry-related cross-peaks, through the construction of an explicit bipartite graph between upper- and lower-diagonal NOEs (Fig.  1c, d ). The information of which NOE peaks were not considered due to violating either 1 or 2 is provided back to the user in a comprehensive table.

MAUS output

MAUS provides a detailed output file with all valid resonance assignment options for each methyl resonance, together with an explanation of how each NOE cross-peak was used by the method. The output file provides information about all non-diagonal NOEs: the 2D peak(s) to which each NOE peak was clustered and its symmetry-related peak. If the NOE peak is not used as a constraint either due to no identifiable symmetric peaks or because it is part of a complex component, which cannot be reduced further, it is indicated in the output file so that the user can revisit the data. In addition, all NOEs corresponding to geminal methyl connectivities are indicated in the output file. This classification summarizes how the NMR data was utilized by MAUS (see Supplementary Table  7 for all benchmark targets used in this study). In the final section, all the resonance assignment options are listed together with a summary of assignment statistics. The total % of resonance assignments for all targets reported here is computed by taking the sum of unambiguous and ambiguous (one to three options) options. If MAUS cannot find a satisfying assignment due to erroneous input data, it provides helpful diagnostic information for the user to troubleshoot their inputs. The input files for all targets in our benchmark set together with detailed description are provided as tutorials in the help page of the MAUS website. MAUS’ exhaustive enumeration process, by definition, does not include methyls that are uniquely assigned to one resonance as options in the assignments lists of any other groups with more than one options, due to the use of explicit constraints in the CNF formula, which prevent this from happening. When a single resonance assignment option is provided to the user, this means that MAUS has explored all possible solutions of the graph-matching problem globally and has found that this is the only option that is consistent with the input NOE data graph and any other constraints supplied by the user. This principle provides a powerful tool for deriving highly complete and 100% accurate assignments as follows. For cases with two or three options, the user can manually resolve the correct assignment aided by intensity considerations through inspection of interproton distances in the X-ray or NMR structure (intensities are not considered by MAUS). Here, the user can also consider NOEs that have been ignored by MAUS due to high peak overlap or unclear symmetry, as marked on the annotated output peak table. Methyls for which the correct assignment can be confidently determined by the user can be then fixed in the MAUS input, to trigger a next round of automated assignments. We find that this iterative procedure “propagates” constraints through the data graph, leading to resolution of all assignments in a matter of a few hours while requiring minimal intervention by the user, even for systems with highly complex spectra.

Transform-restrained Rosetta

We used the trRosetta protocol, which makes use of the orientation restraints predicted using deep residual neural networks, together with distance restraints from co-evolution information to model structures at a high resolution in the Rosetta force field 25 . We provided as input the sequence of MBP to trRosetta using a web-server available at https://yanglab.nankai.edu.cn/trRosetta/ . An ensemble of five models output by trRosetta was supplied to MAUS together with our 2D and 3D NMR data, to obtain the resonance assignments.

Comparison with publicly available assignment algorithms using NOE peak lists

Protons were added to X-ray structures using the NIH PDB Utility Server ( https://spin.niddk.nih.gov/bax/nmrserver/pdbutil/sa.html ). In all simulations, the symmetrization tolerance was set to 0.15, 0.15, and 0.02 p.p.m. for the 13 C NOE , 13 C indirect , and 1 H direct dimensions, respectively, except Cas9 HNH domain, for which 0.1 p.p.m. was used for 13 C NOE and 13 C indirect . In all simulations, the long mixing time NOE distance threshold was set to 10 Å, whereas the short mixing time NOE distance threshold was optimized for different targets in the 6–8 Å range (see Table  1 ).

The 2D input lists for MAGIC were generated from the NMRFAM-SPARKY-exported 2D 1 H- 13 C HMQC list and the 50 ms 3D C M -C M -H M NOESY list (used to automatically determine geminal pairing). The resulting 2D list was curated for accuracy regarding both methyl residue type and geminal pairing.

The experimental chemical shift file (methyl.exp) was generated from the NMRFAM-SPARKY-exported 2D 1 H- 13 C HMQC list using an in-house script. Predicted chemical shifts (methyl.pre) were determined using CH3Shift. Linewidths in the 300 ms 3D C M -C M -H M NOESY data were determined using NMRFAM-SPARKY or CcpNmr. FLAMEnGO v2.4 was used. A total of 100,000 Monte Carlo rounds were performed for each calculation. The final result was taken as the summary of 10 independent runs.

The experimental chemical shift file (methyl.exp) was generated from the NMRFAM-SPARKY-exported 2D 1 H- 13 C HMQC list using an in-house script. Predicted chemical shifts (methyl.pre) were determined using CH3Shift. The weight of chemical shift to NOE was kept as 0.2 (the default). The final result was taken as the summary of ten independent runs.

The 2D and 3D input peak lists were picked manually in NMRFAM-SPARKY and were used to generate inputs for MethylFLYA 12 . For every target, we carried out 100 independent calculations using three different distance cutoff values (4.5, 5.0, and 5.5 Å) on a single processor to generate expected NOESY peaks from the input structure. Consistent with the parameters used to run all methods, we used 13 C and 1 H chemical shift tolerance values of 0.15 and 0.02 p.p.m., respectively, for all targets with the exception of HNH, where a 13 C tolerance of 0.1 p.p.m. was used. Consensus results from all runs for assignments classified as strong by MethylFLYA (considered reliable) are reported in Table  2 .

Reporting summary

Further information on research design is available in the  Nature Research Reporting Summary linked to this article.

Supplementary information

Acknowledgements.

We acknowledge Alison Lindberg and the ITS/ADC staff at UCSC for their assistance in setting up the MAUS website, and Ben Sherman for sharing his python implementation of the satisfiability algorithm designed by Dimitris Achlioptas. This research was supported through NIGMS (R35GM125034) and NIAID (R01AI143997) grants to N.G.S., and through a High-End Instrumentation (HIE) Grant (S10OD018455), which funded the 800 MHz NMR spectrometer at UCSC.

Author contributions

N.G.S. designed the project. N.G.S. and S.N. designed and S.N. implemented the MAUS web-interface. N.G.S., V.S.D.P., and A.C.M. recorded and analyzed NMR experiments. N.G.S., S.N., and V.S.D.P. wrote the paper, with feedback from all authors.

Data availability

Code availability, competing interests.

N.G.S. is listed as a co-inventor in a US Patent (US10871459B2) related to this work. The other authors declare no competing interests.

Peer review information Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Santrupti Nerli, Viviane S. De Paula.

The online version contains supplementary material available at 10.1038/s41467-021-20984-0.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 29 January 2021

Backbone-independent NMR resonance assignments of methyl probes in large proteins

  • Santrupti Nerli 1 , 2   na1 ,
  • Viviane S. De Paula   ORCID: orcid.org/0000-0003-3171-6477 2 , 3   na1 ,
  • Andrew C. McShan 2 &
  • Nikolaos G. Sgourakis   ORCID: orcid.org/0000-0003-3655-3902 2  

Nature Communications volume  12 , Article number:  691 ( 2021 ) Cite this article

5640 Accesses

25 Citations

5 Altmetric

Metrics details

  • Solution-state NMR
  • Structural biology

Methyl-specific isotope labeling is a powerful tool to study the structure, dynamics and interactions of large proteins and protein complexes by solution-state NMR. However, widespread applications of this methodology have been limited by challenges in obtaining confident resonance assignments. Here, we present Methyl Assignments Using Satisfiability (MAUS), leveraging Nuclear Overhauser Effect cross-peak data, peak residue type classification and a known 3D structure or structural model to provide robust resonance assignments consistent with all the experimental inputs. Using data recorded for targets with known assignments in the 10–45 kDa size range, MAUS outperforms existing methods by up to 25,000 times in speed while maintaining 100% accuracy. We derive de novo assignments for multiple Cas9 nuclease domains, demonstrating that the methyl resonances of multi-domain proteins can be assigned accurately in a matter of days, while reducing biases introduced by manual pre-processing of the raw NOE data. MAUS is available through an online web-server.

Similar content being viewed by others

nmr backbone assignment

Automatic structure-based NMR methyl resonance assignment in large proteins

nmr backbone assignment

Robust automated backbone triple resonance NMR assignments of proteins using Bayesian-based simulated annealing

nmr backbone assignment

Methyl probes in proteins for determining ligand binding mode in weak protein–ligand complexes

Introduction.

The use of methyl probes has opened new avenues for the application of nuclear magnetic resonance (NMR) methods to study large molecular machines 1 . The signal enhancement offered by methyl-transverse relaxation optimized spectroscopy (TROSY) techniques 2 and a suite of experiments for quantitative characterization of protein dynamics occurring over a broad range of timescales 3 have rendered methyl-based NMR a formidable tool for detailed mechanistic studies of important biological systems 4 . The main bottleneck in applications of methyl-based NMR is obtaining confident resonance assignments. In the conventional approach, backbone assignments are first established using triple-resonance experiments 5 . Then, methyl resonances are connected to the backbone using either methyl out-and-back experiments 6 or, more commonly, using 15 N and 13 C edited amide-to-methyl nuclear Overhauser effect (NOE) measurements 7 . However, in the absence of previously established backbone assignments, deriving confident assignments for methyl resonances remains a challenge. Although site-directed mutagenesis of individual methyl-bearing residues 8 provides unambiguous assignments, the laborious, costly, and time-consuming nature of this approach limits applications to study larger, multi-domain proteins.

Recently, several automated methyl assignment methods have been proposed 9 , 10 , 11 , 12 . To circumvent the need for existing backbone assignments, methyl NOE data and a known structure of the target protein can be used to derive a set of possible assignments, by fitting local NOE networks to methyl distances derived from the three-dimensional (3D) structure. In general, these methods start from fitting sub-sets of NOE connectivities to local clusters of methyls in the structure and then expand those to derive self-consistent assignments for the remaining methyl resonances. The NOE peak intensities are then used in an optimization process aiming to further reduce the space of solutions. However, any method that uses local information is bound to make mistakes globally and, as a result, most of these methods can miss the correct (ground truth) assignments from their scope of solutions, yielding a significant (up to 55%) error rate. An alternative approach is to perform an exhaustive mapping of the global NOE network to the target structure. This strategy was first outlined in the method MAGMA 13 , which treats methyl assignments as a maximum subgraph matching problem, and invokes a graph theory algorithm 14 to enumerate all assignments, which satisfy the maximum number of NOE connectivities. If the input contains only true positive NOE data (i.e., the NOE data graph is a subgraph of the structure graph), then MAGMA resorts to the VF2 algorithm 15 . Alternatively, maximum subgraph matching provides a reasonable compromise to account for the minimum number of false-positive NOEs, with the caveat that the ground truth solution may entail additional misidentified NOEs in the input. Moreover, MAGMA requires users to provide an annotated data graph, derived through an extensive, manual analysis of 3D or four-dimensional (4D) NOE spectroscopy (NOESY) peak data, which introduces an additional processing step, limiting its use by non-experts.

In the present work, we describe an automated system (MAUS: Methyl Assignments Using Satisfiability), which first formulates a set of rules, and then provides a compact description of all assignment possibilities that are consistent with these rules. Specifically, MAUS generates a structure graph, G , representing all methyl NOE connectivities present in an input Protein Data Bank (PDB) structure or structural model of the protein of interest, and multiple independent data graphs, H , containing all possible NOE networks, which can be derived from a list of raw 3D or 4D NOESY peaks. The NOE network is supplemented with residue type, stereospecificity, and geminal methyl connectivity constraints. Then, MAUS leverages an efficient algorithm to determine all valid ways of mapping every H into G (termed subgraph isomorphism), which respects all the experimental inputs. We test our method on a benchmark set of protein targets in the 10–45 kDa size range and show that MAUS maintains a robust performance, providing 100% accurate assignments at high levels of completeness, while offering a significant performance advantage relative to existing methods using the same inputs. Using MAUS, the methyl resonances of large, multi-domain proteins can be assigned accurately in a matter of days, completely bypassing the need for more laborious backbone-based NMR spectroscopy approaches.

Results and discussion

Methyl assignments as a subgraph isomorphism problem.

Rather than treating the methyl assignments as a maximum subgraph matching problem 13 , MAUS models the NOE data as a sparse sample of all possible connectivities present in the input structure. MAUS uses the NOE network together with additional experimental inputs, such as peak residue type information and geminal methyl resonance connectivities, to build a system of hard constraints. The constraints outline a subgraph isomorphism problem of fitting a sparse data graph H , into the original structure graph G (see “Methods” and Fig.  1a ).

figure 1

a Iterative workflow of the MAUS system. b Description of a structure graph, G with (~50–150) methyls as nodes. The edges of G correspond to all possible short-range (blue; up to 6–8 Å), long-range (black; from 6 up to 10 Å), and geminal methyl connectivities (green). c The 2D projections of 3D or 4D NOESY peaks are clustered (within tolerances; gray circles) to 2D HMQC reference peak positions (P1…P8). A NOESY peak can be clustered uniquely (red) or ambiguously (green and blue; overlapping peaks), leading to ~2 30 clustering combinations for typical 3D C M -C M H M NOESY data. d The observed chemical shift coordinates (C 1 ,C 2 ,H 2 ) of all NOE peaks are used to construct a symmetrization graph, S , with partitions representing upper (C 1  > C 2 ) and lower (C 2  > C 1 ) diagonal cross-peaks. S has nodes represented by short-range (blue) and long-range (black) NOEs, and edges connecting potentially symmetric NOE peaks. S has components of sizes 1 (no symmetry or NS), 2 and 3 (simple), and >3 (complex, CC), producing ~2 14 possibilities in the case of data recorded for maltose-binding protein, MBP. Simple components are used as constraints whereas complex components are reduced using an iterative process within MAUS: the maximum matchings (MMs) of each complex component are tested for satisfiability; satisfiable edges are retained in S . e A N-ary tree showing data graphs ( H 1 ,H 2 ,H 3 ,…,H n ) generated from all clustering and symmetrization alternatives (~2 44 possibilities); a majority of these data graphs do not lead to satisfying assignments and are eliminated by MAUS (red circles). Each remaining H can be mapped in ~2 100 possible ways onto G ; all valid methyl resonance assignments are enumerated and presented to the user in a compact form.

To fully account for all methyl connectivities consistent with the input structure and spin diffusion effects 16 , alternative side-chain rotamers are modeled using the program Rosetta 17 and maximum distances of 8 and 10 Å are applied to derive a structure graph, G , containing all possible short- and long-range NOEs, respectively (Fig.  1b ). In addition, MAUS explicitly considers (i) all possible mappings of 3D NOE cross-peaks to two-dimensional (2D) reference peaks, or clusters (Fig.  1c ), and (ii) all possible matchings between upper-diagonal and lower-diagonal NOE cross-peaks, formally analyzed as connected components of a bipartite symmetrization graph (Fig.  1d ). This approach relieves the user from the burden of interpreting the raw data (3D or 4D NOE peaks), through an explicit and exhaustive consideration of all possible data graphs consistent with the input NOE peaks (Fig.  1e ).

MAUS leverages a special-purpose constraint satisfaction solver (SAT) to enumerate all valid assignments using an iterative ansatz (see “Methods”). Using 3275 simulations of NOE data and structure graphs from a non-redundant set of 147 protein structures, we show that, relative to the VF2 algorithm 15 , SAT maintains a robust performance for problems of different sizes, topologies, and data sparsity levels, delivering accurate assignments in a matter of seconds (Supplementary Fig.  1 ). Finally, using the MAGMA benchmark set of eight targets 13 with the identical structure and data graphs provided to both programs, we find that, although MAGMA is marginally faster than MAUS for targets, which can be solved relatively quickly by both methods, MAUS maintains a consistent performance across all targets, including larger, more complex cases such as the 81.4 kDa maltose synthase G (Supplementary Table  1 , Supplementary Results ).

MAUS workflow and results on targets with known assignments

We tested MAUS using representative data sets recorded for a benchmark set of four protein targets spanning a range of sizes, folds and domain complexities: human β 2 -microglobulin (Hβ 2 m; 12 kDa, all-β-fold, single domain), maltose-binding protein (MBP; 41 kDa, all-α-fold, two-domain), and two major histocompatibility complex class-I (MHC-I) molecules of divergent heavy-chain sequences (HLA-A01; 45.5 kDa and HLA-A02; 44.8 kDa, mixed α/β-fold, three-domain) (Table  1 ). The X-ray structures and ground truth assignments for these targets were obtained from the PDB and BMRB, respectively. To obtain a consistent set of experimental data for all targets, we prepared 13 C/ 1 H (MA)ILV-methyl-labeled samples and acquired one 2D reference 1 H- 13 C heteronuclear multiple quantum coherence (HMQC) and two 3D 13 C M - 13 C M 1 H M NOESY-HMQC spectra, recorded with short (50 ms) and long (300 ms) mixing times (Fig.  2 , Supplementary Fig.  2 , and Supplementary Table  2 ). Besides providing supplementary short-range (<6–8 Å) constraints, the short mixing time NOEs help reduce complex components of the symmetrization graph (Fig.  1d ) and allow determination of geminal connectivities for resonances corresponding to 13 Cγ 1 / 13 Cγ 2 and 13 Cδ 1 / 13 Cδ 2 methyls of Val and Leu residues, respectively. Finally, towards reducing spectral overlap in the 2D reference spectra of larger (>20 kDa) targets, we prepared proS-labeled samples, stereospecifically defining the resonances of Leu/Val methyls (Fig.  2 ). Using this information as input for MAUS, we find that among all possible clustering/symmetrization alternatives resulting from peak overlap (2 40 -2 80 ), only a tractable number (2 10 -2 20 ) can lead to valid assignment solutions (Supplementary Table  3 ) and are further explored in exhaustive subgraph isomorphism ( H into G ) enumerations using the SAT algorithm (Fig.  1e ). Given the sparsity of experimental NOE data sets and our definition of a valid solution, each subgraph isomorphism instance has up to 2 100 valid solutions for a typical 100-methyl protein. However, the solution space is not uniformly distributed among methyl peaks; remarkably, results for all targets in our set show that the NOE network, residue type and stereospecificity constraints are sufficient to provide unambiguous assignments for a large fraction (64–89%) and low-ambiguity (two to three) options for the majority (11–30%) of remaining methyl resonances (Tables  1 and 2 , and Supplementary Fig.  3 ).

figure 2

Two separate protein samples are recommended for generating the standard inputs of MAUS. Sample 1 uses non-stereospecific methyl labeling with either (i) 13 C/ 1 H labeling only at the methyls of Met, Ala, Leu, Val, and Ile residues, on a 12 C/ 2 H background or (ii) 13 C/ 1 H labeling at the methyls and 13 C/ 2 H at the sidechains of methyl-bearing residues on a 12 C/ 2 H background, creating linearized spin systems 43 , which can be used to unambiguously distinguish Leu/Val methyl resonances 18 (Supplementary Figs.  4–6 ). When sample i is used, resonances of Leu/Val peaks can be partially distinguished using an automated chemical shift-based classifier within MAUS (“Methods”). Both samples can be used to define reference methyl chemical shifts using real-time (sample i) or constant-time (sample ii) 2D methyl-HMQC experiments. Long-range (up to 10 Å) NOEs are recorded using a 3D C M -C M H M SOFAST NOESY experiment (300 ms mixing time). A complementary NOESY experiment recorded with a short (typically 50 ms) mixing time is used to identify short-range NOEs, including between the geminal methyl resonances of Leu and Val residues. An additional protein sample 2 is prepared with stereospecific labeling (proS or proR) of Leu and Val methyls, which resolves spectral overlap in the 2D reference HMQC spectrum, and, together with sample 1 , allows unambiguous determination of geminal methyl pairs for Leu/Val. 13 C nuclei are displayed in red. Steps of NMR data analysis are colored green, with the resulting data set illustrated as a red rectangle. The standard input files for MAUS (blue) are the primary sequence, the PDB coordinate file or model structure, the 2D 1 H- 13 C HMQC peak list, and two 3D C M -C M H M SOFAST NOESY peak lists (recorded with short and long mixing time). MAUS also accepts partial assignments that could be included in the user-annotated 2D 1 H- 13 C HMQC list, together with a specification of allowed residue types, stereospecificity, and geminal methyl connectivities (if present) for each 2D reference peak. The output includes methyl assignment statistics, residue type classification, NOE assignments, and final lists of assignment options. Although MAUS does not consider the NOE peak intensities, this information can be evaluated by the user toward further reducing assignment ambiguities in the output lists.

Robustness of MAUS to missing experimental inputs

For optimal results, MAUS normally requires an NOE network with degree connectivity of 3.5 or greater (corresponding to an average of 1.75 experimentally observed NOE peaks per methyl). In order to unambiguously define the resonances of Leu and Val methyls and apply these as constraints to MAUS, we recommend the use of a homonuclear decoupling method 18 (Supplementary Fig.  4c, d ). However, to account for cases where peak residue type information is not available experimentally, we also tested MAUS on all targets by either (i) defining resonances in the Leu/Val region as either Leu or Val, or (ii) engaging a chemical shift-based classifier, within MAUS (see “Methods”). Removing Leu/Val residue type information has a significant impact on MAUS performance, resulting in an average of 27% decrease in assignment completeness across our benchmark set (Table  2 , columns 3 and 4). Here, the use of the MAUS classifier to distinguish between confident Leu and Val methyl resonances led to an improved average of 18% less uniquely assigned peaks. Nonetheless, the accuracy of all assignments remained 100% and the ground truth solution was valid across all targets (Table  2 and Supplementary Table  4 ). For example, for MBP, MAUS assigned 71% of resonances uniquely and an additional 17% with two or three options while maintaining 100% accuracy, despite the absence of experimental residue type information. In addition, if users cannot provide any annotations of the 2D reference peaks in the input, then MAUS inherently loses geminal connectivity constraints for all methyl resonances corresponding to Leu and Val residues, in addition to the residue type. It is therefore recommended, when using data sets where both geminal methyls are labeled, that MAUS is run using either missing geminal connectivity or Leu/Val residue type information, but not both.

Performance of MAUS using simulated NOE networks from PDB structures

Although results using our benchmark set of four targets with known assignments and recently recorded data highlight the utility of our method for proteins of sizes up to ~46 kDa, methyl-based NMR can be applied on much higher-molecular weight proteins. To test the size limit supported by MAUS, we expanded our simulation benchmark set to include 63 PDB structures with structure graphs containing more than 100 methyls from Ile, Leu and Val residues (corresponding to system sizes of up to 116 kDa) (Fig.  3 ). We chose to simulate assignment cases corresponding to the Isoleucine-Leucine-Valine (ILV), as opposed to the Alanine-Isoleucine-Leucine-Valine (AILV) labeling scheme, as a large fraction of the resonances corresponding to Ala methyls are missing from the spectra of larger proteins. For each target, a data graph was simulated by removing edges from the corresponding structure graph (defined as all methyl connectivities up to 10 Å present in the 3D structure) until a degree connectivity (defined as 2 × number of edges/number of nodes) of 3.5 or higher was reached. This value was chosen to represent experimental cases with 1.75–2.1 observed NOEs/methyl resonance, which are characteristic of real-life data sets. We find that, as long as this requirement is satisfied, MAUS can efficiently tackle the computational complexity of the graph-matching problem and provide meaningful assignments in a reasonable time (up to 4 h on a single central processing unit, CPU), even for larger targets. Specifically, our simulation results show that we can obtain a high coverage (60–80%) of methyl groups with 1–3 residue assignment options for targets up to 352 methyls. Moreover, MAUS can address proteins of up to 716 methyls/200 kDa (as exemplified by the Teneurin 2 partial extracellular domain, PDB ID: 6FB3), albeit in a longer time (11 h on a single CPU), which is still feasible from a computational standpoint.

figure 3

a Top: target size distribution (in terms of number of methyl groups from Ile, Leu, and Val residues) of 63 proteins with high-resolution PDB structures of sizes up to 116 kDa (352 methyls; PDB ID: 5WTI) (top). For each target, a data graph H was simulated by removing edges from the corresponding structure graph G (defined as all methyl connectivities up to 10 Å present in the 3D structure) until a degree connectivity (defined as 2 × number of edges/number of nodes) in the range of 3.5–4.2 was reached, corresponding to 1.75–2.1 simulated NOEs/methyl. Bottom: bar plot showing the distribution of degree connectivities among all simulated data graphs. b Scatter plot showing run time (in minutes) taken by MAUS to perform exhaustive enumeration of the possible methyl assignment options for each H into G mapping case. Colors indicate % of methyls with up to three assignment options, according to the scale on the right.

De novo resonance assignments of Cas9 nuclease domains

We further evaluated the performance of MAUS on deriving de novo, blind assignments for four representative targets; three domains of the Cas9 nuclease and the therapeutic cytokine interleukin-2 (IL-2) (15.4 kDa) (Fig.  4 and Supplementary Figs.  5–7 ). These comprise biomedically relevant proteins of mostly α-helical folds, leading to a high degree of spectral overlap, which challenges automated assignment methods. Cas9 is a 160 kDa RNA-guided endonuclease, which introduces DNA double-strand breaks upon site-specific recognition of a short nucleotide Protospacer Adjacent Motif, preceding the cleavage site 19 . The Cas9 enzyme comprises of a recognition lobe (REC) that forms an RNA:DNA hybrid through three subdomains (REC1–3), and a nuclease lobe with HNH and RuvC domains, which cleave the DNA strand that is complementary and non-complementary to the guide RNA, respectively (Fig.  4a, b ). Towards establishing methyl assignments, we designed optimized constructs of individual Cas9 domains, HNH (15.7 kDa), REC2 (15.6 kDa), and REC3 (24.5 kDa), showing well-dispersed methyl NMR spectra (Fig.  4c ). For these three domains, MAUS assigned 89%, 59%, and 67% of methyl resonances uniquely, and further provided assignment lists with two to three options for 11%, 12%, and 28%, respectively, while always maintaining 100% accuracy, relative to the ground truth assignments (Fig.  4d , Table  1 , and “Methods”). We applied a selective-decoupling experiment to distinguish Leu/Val resonances 18 (Supplementary Figs.  6 and 7 ) and used conventional methods to obtain reference backbone and methyl assignments for validating the MAUS results ( Supplementary Results ). Concurrently with the development of the present work, the backbone and side-chain assignments for the HNH domain were obtained using a conventional backbone-based approach and released in the Biological Magnetic Resonance Data Bank (BMRB, entry 27949) 20 , 21 . The MAUS-derived methyl assignments were in full agreement with the published results, which further highlights the practical utility of our method in saving machine time and manual effort spent.

figure 4

a Domain organization of Spy Cas9 composed of the recognition lobe (REC) and nuclease lobe (NUC). BH, bridge helix; PI, PAM-interacting. b Surface representation of SpyCas9 (PDB ID 4cmp) depicting the bilobed architecture. Protein domains are colored as in a . c Single domains used in the divide-and-conquer approach. 1 H- 13 C HMQC spectra of selectively labeled HNH (green), REC2 (red), and REC3 (blue) domains at Ile δ 1 - 13 CH 3 ; Leu, Val- 13 CH 3 / 13 CH 3 positions, acquired at 800 MHz, 25 °C. The spectra of the individual domains of Cas9 indicate that they retain their proper fold when in isolation. d Number of valid assignment options for each residue identified by MAUS for HNH, REC2, and REC3, respectively. The colored spheres represent final valid resonance assignment options: violet (1 option), green (2 options), yellow (3 options), and red (>3 options).

For IL-2, MAUS identified unique assignments for 74% and provided a short list (two to three) of confident options for an additional 23% of all methyl resonances (Table  1 and Supplementary Fig.  3 ). With the MAUS lists in hand, we manually considered the NOE peak intensities to reach complete (>95%) assignments for all targets in a matter of hours. Our results show that, together with the quality of the NMR sample, the protein fold and methyl chemical shift dispersion is equally important for obtaining complete assignments. Notably, the amide 15 N- 1 H TROSY spectrum of the Cas9 REC3 domain exhibits a significant fraction of missing backbone amide resonances (~55%, residues 660–712), likely due to conformational exchange-induced line broadening. Thus, methyl-based NMR supported by MAUS, is a practically useful tool to study not only larger systems with intractable backbone amide spectra, but also enabling routine applications for the study of medium-sized targets.

Comparison with previous methods

When compared with other methyl assignment tools (MAGIC/FLAMEnGO2.4/MAP-XSII) using the same inputs for all targets with known assignments in our benchmark set, MAUS produces 100% accurate results, while the previous methods show error rates of up to (34/16/56%), and significantly lower assignment completeness rates (Table  2 and Supplementary Table  5 ). Consistently with the results obtained for our benchmark set, MAUS maintains an improved performance on the 4 blind targets in our set (IL-2, REC2, HNH, and REC3), in terms of both unambiguous assignments and error rate (Table  2 ).

A recent commercially available method, MethylFLYA 12 , outperforms the existing publicly available methyl assignment tools. We carried out a detailed comparison of MAUS (also readily available to users via an online interface) with MethylFLYA on the eight targets used in our study, by providing the identical inputs to both methods. Upon comparison of the number of resonances assigned uniquely by MAUS and with high confidence by MethylFLYA, we find that MAUS outperforms MethyFLYA for 6/8 targets in our set (Hβ 2 m, MBP, IL-2, HNH, REC2, and REC3). Specifically, (i) for IL-2 and REC2, MethylFLYA reports a significantly lower number of confident assignments (38% and 10%) relative to MAUS (74% and 60%), while also producing two incorrect assignments for each target, (ii) for HNH and MBP, it produces confident assignments at lower completeness levels, and (iii) for Hβ 2 m and REC3, MethylFLYA crashes (segmentation fault), producing no results. Although MethylFLYA assigns a higher fraction of methyl resonances for the two larger targets in our set (HLA-A01 and HLA-A02), it requires a prohibitively longer computation time on a single CPU and, in general, it is less efficient than MAUS by two to three orders of magnitude (Table  2 ). Finally, although MethylFLYA provides low-confidence options for some of the unassigned methyl resonances, MAUS provides a list of valid options for each 2D reference peak, which contains the correct assignment, owning to the fact that it applies a satisfiability-based enumeration process that is both exact, and exhaustive.

Consistent methyl assignments guided by sequence-based models

The structures of targets which lack a representative template in the PDB can be modeled using homology-based methods 22 or, in cases with no identifiable sequence homologs in the PDB, sequence co-evolution approaches 23 , 24 . To examine the latter, more challenging case, we used trRosetta (transform-restrained Rosetta) 25 , which has demonstrated high accuracy for complex Critical Assessment of Structure Prediction 26 targets. We derived a set of models for a representative case in our benchmark set, MBP using the trRosetta web-server, and used this together with our experimental NOE data recorded on a MILV proS sample (Supplementary Table  2 ) to run MAUS. The models showed a 3.3 Å average all-atom root-mean-square deviation (RMSD) from the X-ray structure, nonetheless, MAUS still provided unique and low-ambiguity (two to three options) assignments for 68% of resonances (relative to 96% using the X-ray structure), while maintaining 100% accuracy (Fig.  5a ).

figure 5

a Top: flowchart showing the combined de novo structure modeling and methyl resonance assignment strategy using trRosetta (transform-restrained Rosetta) and MAUS, respectively. (bottom) Overlay of trRosetta model (wheat) and X-ray structure (PDB ID: 1DMB; pale cyan) of maltose-binding protein. Number of resonance assignment options identified by MAUS are indicated by colored spheres (violet: 1, green: 2, yellow: 3 and red: >3). b Top: structures of a designed homotrimeric protein 27 , in the open (solved by X-ray crystallography) and closed (solved by solution NMR) states showing inter-methyl distances corresponding to experimental NOEs identified in a 3D C M -C M H M NOESY spectrum, assigned manually 27 . Gray spheres show methyl residues participating in the NOE network. The dashed lines represent pairwise distances in the two models, also shown as bar plots (bottom) for the open (gray) and closed (golden) states. The dashed line in the bar plot highlights the 10 Å NOE upper distance limit used by MAUS. Using the unassigned NOE peaks, MAUS delivers a set of satisfying assignments when provided with a model of the closed, but not of the open structure as input.

As structure prediction methods may also lead to globally incorrect models, we examined whether MAUS can distinguish between two closely related structures of the same protein sequence directly from unprocessed 3D NOE peaks (picked directly from 3D NOESY data using a signal-to-noise (S/N) ratio ≥ 5). To test this, we used a designed homotrimeric protein (XAA) consisting of a multi-layer, three-helix bundle with modular termini that may adopt two divergent structures corresponding to an open (observed by X-ray) and a closed state, which corresponds to the solution structure determined using manual NOE assignments by our group 27 . Starting from the unassigned NOE peaks, we ran MAUS using both models as inputs including both intra- and inter-chain connectivities in the structure graph G , and found that only the closed state led to assignments which satisfy the global NOE network within our default 10 Å upper limit. MAUS finds that the data graph is not isomorphic to the structure graph derived from the open state, due to a subset of NOE constraints that are violated by more than 25 Å (Fig.  5b ). This result justifies our choice of an exact subgraph isomorphism approach, since any attempt to arbitrarily remove NOEs (i.e., edges of H ) not explained by the input structure (i.e., edges of G ) could lead to a cascade effect, resulting to global changes in the space of solutions.

In summary, we demonstrate that a satisfiability-based approach can deliver reliable assignments for a range of targets amenable to solution-state NMR, using unprocessed NOE peak data (3D or 4D peak lists) and an existing structural model with the correct overall fold. Our results using NOESY spectra recorded for proteins with known assignments as well as for several blind targets show that, unlike previous methods, MAUS provides 100% accurate solutions for a large fraction (60–80%) of methyl groups, thereby reducing manual effort, costs, and errors introduced due to manual pre-processing and validation of the data. We further demonstrate that, in the absence of high-resolution structures or structural homologs in the PDB, sequence co-evolution-based models can be used by MAUS, without compromising the correctness of produced assignments. Alternatively, for larger, multi-domain proteins with complex methyl spectra, a user may apply a divide-and-conquer strategy, supported by MAUS. Using our online web-server, users can now assign the methyl spectra of large, multi-domain proteins in a matter of a few days, also considering the time it takes to record the NMR data. Our method opens new possibilities for studying challenging, complex molecular machines, as illustrated here for the Cas9 nuclease.

Protein expression and purification of benchmark targets

DNA corresponding to the luminal domains of Hβ 2 m, HLA-A*01:01, and HLA-A*02:01 expressed in Escherichia coli BL21 (DE3) cells, refolded and purified as described 27 , 28 . Briefly, Hβ 2 m, HLA-A*01:01, and HLA-A*02:01 induced with 1 mM isopropyl β- d -1-thiogalactopyranoside (IPTG) at an OD600 of 0.7 at 37 °C for 4 h, isolated from inclusion bodies and refolded in vitro. Inclusion bodies were isolated from E. coli cell pellets by sonication, following by a wash with 100 mM Tris pH 8, 2 mM EDTA, 0.1% (v/v) deoxycholate, and solubilization in 5.5 M guanidine hydrochloride under reducing conditions. For in vitro refolding of Hβ 2 m, 100 mg protein was slowly diluted dropwise over 24 h into refolding buffer (0.4 M arginine-HCl, 2 mM EDTA, 4.9 mM reduced l -glutathione, 0.57 mM oxidized l -glutathione, 100 mM Tris pH 8.0) at 4 °C while stirring. For in vitro refolding of HLA-A*01:01 and HLA-A*02:01, 10 mg of peptide (ILDTAGKEEY for HLA-A*01:01 and LLFGYPVYV for HLA-A*02:01), and 100 mg heavy chain and 100 mg Hβ 2 m were slowly diluted dropwise over 24 h into refolding buffer (0.4 M arginine-HCl, 2 mM EDTA, 4.9 mM reduced l -glutathione, 0.57 mM oxidized l -glutathione, 100 mM Tris pH 8.0) at 4 °C while stirring. All refolding proceeded for 4 days at 4 °C without stirring. Purification of Hβ 2 m, HLA-A*01:01, and HLA-A*02:01 was performed by size-exclusion chromatography (SEC) with a HiLoad 16/600 Superdex 75 pg column at 1 mL/min with running buffer (150 mM NaCl, 25 mM Tris pH 8).

XAA was expressed in E. coli BL21 Star (DE3) cells and purified as described 26 . Briefly, XAA was induced with 0.1 mM IPTG at an OD600 of 0.6 and expressed at 18 °C for 17 h. Cells were resuspended in lysis buffer (25 mM Tris pH 8, 300 mM NaCl, 20 mM imidazole) and lysed by sonication. Supernatant was applied to nickel-nitrilotriacetic acid (Ni-NTA) resin pre-equilibrated with lysis buffer. The column was rinsed with ten column volumes of wash buffer (25 mM Tris pH 8, 300 mM NaCl, 20 mM imidazole). XAA was eluted using two column volumes elution buffer (25 mM Tris pH 8, 300 mM NaCl, 250 mM imidazole). The hexahistidine tag was removed via thrombin cleavage by incubating with 1 : 5000 thrombin (EMD, Millipore) for 15 h at 25 °C. Cleaved XAA was purified by SEC using a Superdex 75 10/300 GL column (GE Healthcare) at 1 mL/min with running buffer (150 mM NaCl, 25 mM Tris pH 8).

The MBP-cyclodextrin complex was prepared as described previously 29 . Briefly, MBP was expressed in E. coli strain BL21(DE3) cells, induced with 1 mM IPTG at an OD600 of 0.7 and expressed at 37 °C for 5 h. Purification of MBP-cyclodextrin was achieved as follows. First, MBP was bound to an amylose affinity column in the presence of binding buffer (150 mM NaCl, 25 mM Tris pH 8, 1 mM EDTA) and then eluted using eluted buffer (150 mM NaCl, 25 mM Tris pH 8, 1 mM EDTA, 10 mM maltose). Second, MBP was further purified by SEC with a HiLoad 16/600 Superdex 75 pg column at 1 mL/min with running buffer (150 mM NaCl, 25 mM Tris pH 8). MBP was then partially unfolded for 3 h at 25 °C in 150 mM NaCl, 25 mM Tris pH 8, 2.5 M guanidinium hydrochloride, and refolded by dilution into GuHCl-free buffer containing 2 mM β-cyclodextrin. Following purification all proteins were exhaustively buffer exchanged in their respective buffers (see section “Stereospecific isotopic labeling”).

Streptococcus pyogenes Cas9 domain constructs

The coding sequences for Cas9 HNH (residues 776–908) and REC2 domains (residues 167–307) were synthesized, codon optimized for expression in E. coli , and subcloned into a His 10 -MBP expression vector (Genscript) (Supplementary Table  6 ). pSHS325 bacterial expression plasmid for SpCas9 REC3 domain was a gift from Jennifer Doudna & Keith Joung (Addgene plasmid #101205) 30 . Each protein was expressed in E. coli BL21 (DE3) containing chaperone plasmid pG-KJE8 (TAKARA, 3340) to enhance protein folding 31 and purified as described below 32 . Briefly, when cells reached an OD600 of ~0.6, IPTG was added to a final concentration of 0.5 mM to induce protein expression. Cells were then grown for an additional 18 h at 23 °C. Collected cells were resuspended in lysis buffer (50 mM Tris pH 7.5, 500 mM NaCl, 5% (v/v) glycerol, and 1 mM Tris(2-carboxy-ethyl) phosphine (TCEP) containing an EDTA-free protease inhibitor tablet (Roche). The cell suspension was sonicated on ice and clarified by centrifugation at 27,000 ×  g for 15 min. The soluble lysate fraction was bound in batch to Ni-NTA agarose (Qiagen). The resin was washed extensively with 20 mM Tris pH 7.5, 500 mM NaCl, 1 mM TCEP, 10 mM imidazole, and 5% (vol/vol) glycerol, and the bound protein was eluted in 20 mM Tris pH 7.5, 500 mM NaCl, 1 mM TCEP, 300 mM imidazole, and 5% (vol/vol) glycerol. The His 10 -MBP affinity tag was removed with His 10 -tagged TEV protease during overnight dialysis against 20 mM Tris pH 7.5, 500 mM NaCl, 1 mM TCEP, and 5% (vol/vol) glycerol. The protein was then flowed over Ni-NTA agarose to remove TEV (Tobacco Etch Virus) protease and the cleaved affinity tag, and further purified by SEC on a Superdex 200 16/60 column (GE Healthcare) in 20 mM Tris pH 7.5, 200 mM KCl, 1 mM TCEP, and 5% (vol/vol) glycerol.

Human IL-2 expression, refolding, and purification

Codon optimized DNA encoding the human IL-2 (amino acids 1–133) with a site-specific mutation (C140S) (Supplementary Table  6 ) was expressed in BL21 (DE3) E. coli cells as inclusion bodies. Protein expression was achieved by induction with 1 mM IPTG at an OD600 of 0.6 followed by cell growth at 37 °C for 5 h at 200 r.p.m. For in vitro refolding, ~30 mg of inclusion bodies was dropped diluted into 200 mL of refolding buffer (1.1 M guanidine, 6.5 mM cysteamine, 0.65 mM cystamine, 110 mM Tris pH 8.0) at 4 °C while stirring. Refolding proceeded overnight at 4 °C without stirring. The solution was dialyzed into a buffer of 20 mM MES pH 6.0, 25 mM sodium chloride. Purification of refolded IL-2 was performed by cation exchange chromatography with a CAPTO-SP column using a 25 mM to 1 M NaCl gradient in a buffer with 25 mM MES pH 6.0 followed by SEC with a Superdex 75 column (GE) at 0.5 mL/min with running buffer (50 mM NaCl, 20 mM sodium phosphate pH 6.0). Protein concentrations were determined using A 280 measurements on a NanoDrop with extinction coefficients estimated with the ExPASy ProtParam tool.

Preparation of NMR samples, backbone, and methyl assignments

All proteins were overexpressed in M9 medium in 2 H 2 O containing 2 g l −1 2 H 13 C glucose (Sigma #552151) and 1 g l −1 15 NH 4 Cl. Selective methyl labeling, referred to as ILV*, was achieved by the addition of appropriate precursors (ISOTEC Stable Isotope Products (Sigma-Aldrich) as detailed previously 28 , 33 . ILV-methyl (Ile 13 Cδ 1 ; Leu 13 Cδ 1 / 13 Cδ 2 ; Val 13 Cγ 1 / 13 Cγ 2 ) U-[ 15 N, 2 H]-labeled proteins were prepared in M9 medium in 2 H 2 O, supplemented with 2 g l −1 2 H 12 C glucose (Sigma #552003) and 1 g l −1 15 NH 4 Cl. De novo stereospecific methyl assignments were achieved by utilizing three independently prepared isotopically labeled samples: AILV or ILV (Ala 13 Cβ; Ile 13 Cδ 1 ; Leu 13 Cδ 1 / 13 Cδ 2 ; Val 13 Cγ 1 / 13 Cγ 2 in an otherwise U-[ 15 N, 12 C, 2 H] background), ILV proS (Ile 13 Cδ 1 ; Leu 13 Cδ 2 ; Val 13 Cγ 2 in an otherwise U-[ 15 N, 12 C, 2 H] background), and ILV* (Ile 13 Cδ 1 only; Leu 13 Cδ 1 / 13 Cδ 2 ; Val 13 Cγ 1 / 13 Cγ 2 in an otherwise U-[ 15 N, 13 C, 2 H] background). We employed a multipronged approach where backbone assignments are used to aid the assignment of side-chain methyl groups. Specifically, we first obtained backbone assignments using TROSY-based 3D HNCA, HN(CA)CB, and HNCO experiments recorded with the ILV*-labeled samples. Acquisition times of 30 ms ( 15 N), 14 ms ( 13 CO), and 10/5 ms ( 13 Cα / 13 Cβ) in the indirect dimension were used. Backbone amide assignments were confirmed using amide-to-amide NOEs obtained from 3D H M -NH N and 3D N-NH N SOFAST NOESY-HMQC experiments 7 . Final backbone assignments were further validated using TALOS-N. Next, Ala, Ile, Leu, and Val methyl assignments were achieved using methyl-to-methyl NOEs observed in 3D H M -C M H M and 3D C M -C M H M SOFAST NOESY-HMQC in addition to methyl-to-amide NOEs observed in 3D H N -C M H M and 3D C M -NH N SOFAST NOESY-HMQC experiments 7 recorded on the AILV- or ILV-labeled samples. For the 3D H N -C M H M SOFAST NOESY, the acquisition parameters were 64, 48, and 1280 complex points in the 1 H N , 13 C M , and 1 H M dimensions with corresponding acquisition times of 14.5, 10.8, and 79 ms with eight scans/FID. For the 3D N-C M H M SOFAST NOESY, the acquisition parameters were 48, 40, and 1280 complex points in the 15 N, 13 C M , and 1 H M dimensions with corresponding acquisition times of 26, 9, and 79 ms with eight scans/FID (free induction decay), respectively.

Leu/Val geminal pairs were determined by comparing NOE strips in 3D C M -C M H M SOFAST NOESY experiments recorded using short (50 ms) and long (300 ms) mixing times. The acquisition parameters for the 3D H M -C M H M SOFAST NOESY and 3D C M -C M H M SOFAST NOESY are provided in Supplementary Table  2 . Ile δ 1 methyl types were identified by their characteristic upfield chemical shifts. Leu and Val methyl types were identified using phase-sensitive 2D 1 H- 13 C HMQC experiments recorded on the ILV*-labeled sample 18 . Alanine methyl types were identified by comparing 2D 1 H- 13 C HMQC spectra of AILV and ILV proS samples, and by comparing Ala C β chemical shifts from HN(CA)CB experiments. Finally, stereospecific Leu δ 2 and Val γ 2 methyl assignments were obtained by comparison of 2D 1 H- 13 C HMQC spectra of AILV- and ILV proS -labeled samples. All the data were recorded at a 1 H field of 750 or 800 MHz at 25 °C. The 3D SOFAST NOESY experiments were recorded at 800 MHz, 25 °C using a recycle delay of 0.2 s, and NOE mixing time of 50 ms and 300 ms. Typical data acquisition times were 1 h for the 2D HMQC, 8–10 h for the 50 ms, and 12–14 h for the 200 ms 3D SOFAST NOESY-HMQC experiments. All spectra were acquired using Topspin 3 acquisition software from Bruker. The data were processed in NMRPipe 34 and analyzed in NMRFAM-SPARKY 35 and CcpNMR 36 programs.

Stereospecific isotopic labeling

A specifically methyl-labeled acetolactate precursor (2-[ 13 CH 3 ], 4-[ 2 H 3 ] acetolactate) was obtained through deprotection and exchange of the protons of the methyl group in position four of ethyl 2-hydroxy-2-( 13 C)methyl-3-oxobutanoate (FB reagents) achieved in D 2 O at pH 13 37 . Typically, 300 mg of ethyl 2-hydroxy-2-( 13 C)methyl-3-oxobutanoate was added to 24 mL of a 0.1 M NaOD/D 2 O solution. After 30 min, the solution was adjusted to neutral pH with DCl and 2 mL of 1 M Tris pH 8.0 in D 2 O was added. For the production of highly deuterated [U- 2 H], I-[ 13 CH 3 ]δ 1 , L-[ 13 CH 3 ]proS, V-[ 13 CH 3 ]proS samples, 300 mg/L of 2-[ 13 CH 3 ], 4-[ 2 H 3 ] acetolactate, prepared as described above, was added 1 h prior to induction (OD600 ≈ 0.55). Forty minutes later (i.e., 20 min prior to induction), 3,3-[ 2 H 2 ],4-[ 13 C]-2-ketobutyrate (SIGMA #589276) was added to a final concentration of 60 mg/L. Each protein was induced as described above.

For each labeled protein sample, the concentration and buffer composition was as follows:

- 0.5 mM HLA-A01 in 20 mM sodium phosphate (pH 7.2), 50 mM NaCl, 5% D 2 O.

- 0.5 mM HLA-A02 in 20 mM sodium phosphate (pH 7.2), 50 mM NaCl, 5% D 2 O.

- 1.3 mM Hβ 2 m in 20 mM sodium phosphate (pH 7.2), 50 mM NaCl, 5% D 2 O.

- 0.5 mM XAA domain in 20 mM sodium phosphate (pH 6.2), 100 mM NaCl, 5% D 2 O.

- 0.8 mM MBP domain in 20 mM sodium phosphate (pH 7.2), 50 mM NaCl, 5% D 2 O.

- 2.0 mM HNH domain in 20 mM HEPES (pH 7.5), 80 mM KCl, 5% D 2 O.

- 0.5 mM REC2 domain in 20 mM sodium phosphate (pH 7.2), 50 mM NaCl, 5% D 2 O.

- 0.7 mM REC3 domain in 20 mM HEPES (pH 7.5), 200 mM KCl, 5% deuterated glycerol-d8, 1 mM TCEP, 0.01% NaN 3 , 5% D 2 O.

- 0.4 mM IL-2 in 20 mM sodium phosphate (pH 6.0), 50 mM NaCl, 5% D 2 O.

Generating a structure graph G in MAUS system

The input PDB structure is utilized by MAUS to construct an undirected graph G (Fig.  1b ). First, for all pairs of methyls in the input structure, we compute the following function of the average distance between all pairs of protons from the two methyl groups (Eq. ( 1 )):

Where p and q are methyl groups and p i  and q j represent protons of each methyl group in an input structure s , and d denotes the Euclidean distance. When calculating an effective distance between the nine pairs of methyl protons, the average value of the distances (i.e., multiplicity correction) used here results in a minor increase in upper distance bounds relative to more commonly used r −6 summation. For instance, if we have three distance measures 4 Å, 5.5 Å, and 7 Å, the regular r −6 summation results in a value of 3.9 Å, whereas the average value is 4.7 Å, which remains within the range of the observed interproton distances. Due to the fact that MAUS explicitly considers different side-chain rotamers in addition to applying relatively large upper distance thresholds, the exact choice of distance averaging does not influence our results significantly (as opposed to a more precise estimate of upper bounds that is required for NMR structure determination applications).

To account for alternative side-chain rotamers, the input structure is subjected to n (quick: 10 or thorough: 100) independent relaxations using the FastRelax 38 protocol in Rosetta. Therefore, the element of the adjacency matrix Δ( p ,  q ) is defined by taking the minimum of the distance functions observed among all sampled conformations (Eq. ( 2 )):

We call an edge of the structure graph, G (i) geminal, if it corresponds to a connectivity between the γ 1 / γ 2 and δ 1 / δ 2 methyls of Leu and Val residues, (ii) long-range if 6 Å < Δ( p ,  q ) ≤ 10 Å, and (iii) short-range if Δ( p ,  q ) ≤ 6 Å.

Deriving all possible data graphs, H , from the input NOE data

The two 3D NOESY spectra (recorded with short and long mixing times) are picked at a typical S/N level of 5 or higher and provided as an input list of (C1, C2, H2) coordinates. In addition, a high-resolution 2D reference methyl-HMQC spectrum is picked to identify all reference (C, H) methyl resonances. For larger (>20 kDa) protein targets, simultaneous consideration of the 2D spectrum from a separate, stereospecifically labeled proS sample helps resolve overlap, to identify the exact 13 C and 1 H chemical shifts corresponding to each observable methyl in the NMR sample. The 2D reference and two 3D NOESY peak lists are provided as inputs to MAUS.

We model all alternative NOE connectivities resulting from overlap in the input 2D and 3D spectra, to allow an explicit consideration of all possible data graphs, H , which are consistent with the input data. First, a 3D NOESY cross-peak can be projected to one or several 2D reference peaks (Fig.  1c ). In particular, a 3D maximum (C1, C2, H2) can be projected to a 2D maximum (C, H) if and only if the following holds true (Eqs. ( 3 ) and ( 4 )):

The tolerance values are set, by default, to 0.15 p.p.m. for 13 C and 0.02 p.p.m. for 1 H chemical shifts. NOE peaks with no identifiable projection, as well as diagonal NOEs are eliminated. This process yields clusters of tentative NOE connectivities for each peak in the 2D reference set.

Second, each valid (true positive) connectivity present in the input data must arise from one upper-diagonal NOE cross-peak with one corresponding, symmetry-related lower-diagonal counterpart. However, identifying unique symmetry relations between pairs of 3D NOE cross-peaks can be challenging due to spectral overlap. Specifically, two 3D NOE peaks (C1, C2, H2) and (C1′, C2′, H2′) are potentially symmetric (Eqs. ( 5 ) and ( 6 )) if and only if

To explore all possible symmetry relations present in the input data, we construct a bipartite symmetrization graph connecting upper-diagonal to lower-diagonal NOE cross-peaks (Fig.  1d ). An edge appears between all pairs of peaks that are potentially symmetric. For typical NOE data sets, the bipartite symmetrization graph consists of several connected components (Fig.  1d ). Isolated peaks are eliminated. Components of sizes 2 and 3, termed simple, provide edges of the data graph, H . Larger, complex components are iteratively reduced to smaller ones using explicit satisfiability, eliminating edges that do not yield satisfying assignments in exhaustive enumerations, as outlined in detail below (Fig.  1d ).

Reducing subgraph isomorphism to satisfiability

A boolean function ƒ on a set of boolean variables \(x\; = \;\{ x_1,\;x_2,\;x_3\,...,\;x_n\}\) is said to be satisfiable if there exists an assignment \(x\; \in \{ 0,1\} ^n\) such that ƒ ( x ) evaluates to 1. The question of whether a function f is satisfiable is known as the satisfiability (SAT) problem in computer science. A boolean function ƒ is in conjunctive normal form (CNF) if it is a conjunction of one or more clauses, where a clause is a disjunction of literals; in other words, it is an AND of ORs. Although satisfiability is NP-complete, i.e., theory suggests that no polynomial-time algorithm exists, in practice, efficient satisfiability solvers have been developed, solving formulas with millions of variables and clauses from milliseconds to a few seconds 39 .

The resonance assignment problem is a subgraph isomorphism problem, which is also NP-complete 13 . MAUS reduces resonance assignment to a boolean satisfiability problem, exploiting the power of satisfiability solvers to explore the space of all possible assignment solutions exhaustively. Specifically, given an instance of a resonance assignment problem, we construct a boolean formula introducing the following variables:

X ( i, j ): a boolean variable declaring whether 2D peak i is assigned to methyl j in the input structure.

Y ( k , l ): a boolean variable declaring whether NOE peak k is clustered to 2D peak l .

Z ( k , k ′): a boolean variable declaring whether NOE peaks k and k ′ are truly symmetric.

Next, MAUS combines these variables into a CNF formula imposing the following hard constraints:

For every 2D peak, exactly one X ( i , j ) is true.

For every methyl in the structure, at most one X ( i , j ) is true.

For every NOE, exactly one Y ( k , l ) is true.

If Z ( k , k ′) is true, then in the symmetrization graph k and k ' must be matched in a maximum matching (MM; outlined below).

If Z ( k , k ′) is true, then the methyl assignments of the 2D projections of NOE peaks k and k ′ must be connected by an appropriate edge in the structure graph, G .

In summary, MAUS casts the methyl assignment subgraph isomorphism problem as a system of Boolean constraints captured by a CNF formula and uses CryptoMiniSAT 39 , a general-purpose solver that provides either (i) a single graph-matching solution at a millisecond timescale on a single CPU or (ii) a mathematical proof that the formula is unsatisfiable. As there can be multiple mappings consistent with the input constraints, MAUS utilizes the solver recursively to enumerate all valid solutions (see below). In particular, MAUS truncates the solution space of the formula, imposing further restrictions on the range of a node, and then recursively checking for novel solutions in the restricted formula. Due to the fact that it uses a custom-built CNF formula, MAUS has the flexibility to add or remove constraints from the formula and perform 1000s of calculations of the subgraph isomorphism problem in a feasible time frame.

Using these two principles of (1) employing a custom-based formula encoding our system of hard constraints and (2) leveraging a fast and efficient solver, MAUS can perform an exhaustive enumeration of the space of solutions, despite the NP-completeness of the problem, using the following iterative ansatz:

Obtain a single valid mapping of peaks into methyls (validity is imposed by the hard constraints) from the solver.

Starting from this valid mapping, consider an arbitrary peak (p1) that has been assigned to a particular methyl (m1) in (i) and add a temporary clause to the propositional CNF formula that p1 cannot be assigned to m1.

Invoke the satisfiability solver a second time to see if it can return an alternative solution for that peak. If it can, then the CNF formula is modified again to include the new solution to the list of excluded assignments and this process is repeated until the solver reaches unsatisfiability (i.e., there are no more assignments of p1 that can lead to a mapping, which satisfies the CNF formula).

Add hard constraints imposing that p1 can only be assigned to exactly one of the valid options and repeat steps (i) through (iii) to enumerate the assignment options for a different peak, until all peaks have been considered.

Repeat step (iv) until all peaks have been considered.

Through this iterative process, MAUS achieves full enumeration of the peak support sets (containing all valid methyl assignments) in a time that is proportional to the sum of the support set sizes, without resolving to the use of heuristics. If no solutions exist, MAUS must return a proof that all possible assignments have been considered. This method is akin to the celebrated Davis–Putnam–Logemann–Loveland procedure for the satisfiability problem 40 , 41 .

Reducing complex components of the symmetrization graph

MAUS reduces the ambiguity arising from spectral overlap in the NOESY data, which is formally described in the form of a bipartite symmetrization graph, allowing it to consider all possible NOE connectivities that can be derived from the raw NOE peaks without any manual effort by the user (Fig.  1d ). Simple components of the graph, containing up to two edges, directly impose satisfiability constraints. For each complex component (i.e., a set of nodes containing at least three edges), MAUS first generates all possible MMs using a O(N 3 ) algorithm efficiently 42 . Every MM is iteratively examined, introducing additional clauses to the CNF formula and running the satisfiability solver. If MAUS encounters an edge, which causes the formula to become unsatisfiable, then this edge is eliminated from all MMs, repeating until no further edges can be eliminated. Thus, the bipartite graph-matching process is iterated many times as an “outer loop”, providing possible constraints to the CNF solver, which serves to eliminate some of the possible matchings. This allows us to identify a maximal, self-consistent set of upper/lower-diagonal NOEs, which is used to provide the final constraints to the solver, completely removing this burden from the user. During this process, complex components decompose into simple components, recovering additional NOE connectivities into H (Supplementary Table  7 ). The small fraction of NOE peaks, which remain within complex components, are not used by MAUS.

Chemical shift-based residue type classifier

To predict the residue type of a given 2D reference peak when this information is not available from additional experiments, we have analyzed assigned chemical shift data in the BMRB. In particular, we have derived correlated 13 C, 1 H chemical shift distributions for each methyl atom belonging to Met, Ala, Leu, Val, and Ile, using all assigned resonances in the database. We then constructed a table associating 13 C and 1 H pairs with the frequency of each residue type, ranked in decreasing order. A 2D peak can be tentatively assigned to a minimum set of residues, such that their cumulative frequency is ≥99%. Using our benchmark targets, we have evaluated the performance of the classifier for Leu and Val residues, as the resonances of all other residues can be unambiguously determined from their position on two 2D spectra recorded using complementary (MA)ILV and ILV proS samples (Supplementary Table  4 ).

Inputs to MAUS

MAUS accepts as inputs, a user-annotated 2D HMQC list, two 3D NOESY lists recorded using long (300 ms) and short (50 ms) mixing times or a 4D NOESY list recorded using long mixing time (300 ms) and a PDB file. In this work, the experimental peak lists were picked manually using a S/N threshold of 5 and guided by the 2D reference spectrum. When two different users in our group picked the same data sets, they arrived at exactly the same assignment result using MAUS, suggesting that this approach is robust to any biases introduced by the user. This simple process allows us to construct NOE peak tables for each target in under 1 h.

The user can input a crystal structure, an NMR ensemble or an ensemble of models computed using structure prediction methods. MAUS also supports oligomeric systems with regular symmetry, by considering subunit interfaces between different chains in the input PDB file, as specified by the user. Each 2D HMQC peak in the input must be provided in a custom format which specifies residue type(s), a unique number representing the peak, geminal connectivity and stereospecificity information (for peaks corresponding to Leu and Val). To ensure consistency, the user must provide the NMR construct sequence (used to confirm that all methyls present in the NMR samples are also present in the input PDB file) and the labeling scheme used to prepare the NMR sample. The choice of methyl labeling scheme is target-specific and should be optimized such as to achieve the maximum number of probes, while retaining a well-resolved 2D methyl spectrum. Although our manuscript focuses on the most commonly used ILV scheme, MAUS also supports the methyl-bearing side chains of Ala and Met. The user may instruct MAUS to use custom radii (upper limits) for short and long mixing time NOEs (see Table  1 and “Methods” for parameters used for our benchmark targets). The radii must be chosen within recommended ranges (6–8 Å for short-range and 10–12 Å for long-range NOEs). In practice, the smallest possible distance threshold that defines a structure graph that is subgraph isomorphic to the data graph is the optimal distance threshold to run MAUS. A rational approach for choosing this distance is outlined here as follows: (i) we first run MAUS using a standard 8 Å distance threshold for the 50 ms NOESY data, which we assume to be the maximum possible 50 ms NOE distance. This assumption holds for all targets used in this study, also considering that the structure graph already takes into account alternative side-chain rotamers and other local variations of the structure. (ii) We then reduce the threshold by steps of 0.1 until MAUS can no longer fit the NOE network into the structure graph. (iii) The final distance threshold is either the minimum distance identified in ii augmented by 0.5 or 8 Å (we select the smaller value). We find that, for all targets, a threshold in the range 6–8 Å gives optimum results, while always retaining the ground truth within the solution range. This strategy for finding the optimal threshold is carried out manually using a series of short preliminary MAUS runs on our online web-server (in cases where MAUS cannot fit the NOE network, a descriptive message is returned to the user immediately). Running all targets using a conservative 8 Å threshold for the 50 ms NOE data still provides valid and meaningful results, with assignment completeness levels (measured in terms of 1–3 assignment options) that decrease by 3% for HLA-A02, 6% for MBP, 40% for IL-2, and 26% for REC2 targets in our benchmark set, relative to the user-optimized thresholds (Supplementary Table  8 ). MAUS also employs tunable tolerance parameters, for diagonal NOEs, clustering, and symmetrization, which are set to default values mentioned in previous sections. Detailed instructions for submitting jobs and creating input files in the required format for MAUS are outlined on the help page of our website.

Inconsistencies in peak residue-type information or geminal methyl connectivities in the inputs provided by the user (such as misidentification of a Leu peak as a Val or misidentification of geminal methyls) will, in most cases, lead to an unsatisfiable system of constraints due to inconsistency with the global NOE network. To avoid this and guide the user towards obtaining confident assignments, in the MAUS input page we recommend that the user should only provide residue type and geminal information if they have 100% confidence and are guided by conclusive evidence from a range of additional experimental inputs (Fig.  2 ) (residue type and proS-selective labeling, 2D experiments that achieve selective inversion of Leu peaks further assisted by the observation of strong geminal NOEs). We find that, when provided with this information, determination of geminal methyls and residue types can be achieved with 100% confidence by the user as shown by our results on eight benchmark and blind targets. Alternatively, if the user is not confident in the residue type and geminal connectivities, our online interface fully supports a mixed input, where for some methyls this information is either not given or a list of options is provided (e.g., Leu or Val). Finally, the user also has the option to engage the MAUS residue type and geminal methyl classifier, which provides 99% accurate results. Together, these options effectively address any possible inconsistencies in the residue type/geminal methyl user inputs.

Inconsistencies in the input NOE peak lists, due to spectral artifacts, errors in peak picking, which may also include incorrectly identified peak centers. To address this type of data discrepancy, we recommend that users pick 3D or 4D NOESY data at sufficiently high S/N > 5 and remove spectral artifacts manually. However, the most critical form of proactively removing inconsistent data points is provided by our rigorous clustering and symmetrization procedures, which requires that every valid NOE constraint: (i) corresponds to a single methyl group on the 2D 13 C/ 1 H reference HMQC spectrum, through the explicit consideration of all possible clustering scenarios, and (ii) arises from two exclusively symmetry-related cross-peaks, through the construction of an explicit bipartite graph between upper- and lower-diagonal NOEs (Fig.  1c, d ). The information of which NOE peaks were not considered due to violating either 1 or 2 is provided back to the user in a comprehensive table.

MAUS output

MAUS provides a detailed output file with all valid resonance assignment options for each methyl resonance, together with an explanation of how each NOE cross-peak was used by the method. The output file provides information about all non-diagonal NOEs: the 2D peak(s) to which each NOE peak was clustered and its symmetry-related peak. If the NOE peak is not used as a constraint either due to no identifiable symmetric peaks or because it is part of a complex component, which cannot be reduced further, it is indicated in the output file so that the user can revisit the data. In addition, all NOEs corresponding to geminal methyl connectivities are indicated in the output file. This classification summarizes how the NMR data was utilized by MAUS (see Supplementary Table  7 for all benchmark targets used in this study). In the final section, all the resonance assignment options are listed together with a summary of assignment statistics. The total % of resonance assignments for all targets reported here is computed by taking the sum of unambiguous and ambiguous (one to three options) options. If MAUS cannot find a satisfying assignment due to erroneous input data, it provides helpful diagnostic information for the user to troubleshoot their inputs. The input files for all targets in our benchmark set together with detailed description are provided as tutorials in the help page of the MAUS website. MAUS’ exhaustive enumeration process, by definition, does not include methyls that are uniquely assigned to one resonance as options in the assignments lists of any other groups with more than one options, due to the use of explicit constraints in the CNF formula, which prevent this from happening. When a single resonance assignment option is provided to the user, this means that MAUS has explored all possible solutions of the graph-matching problem globally and has found that this is the only option that is consistent with the input NOE data graph and any other constraints supplied by the user. This principle provides a powerful tool for deriving highly complete and 100% accurate assignments as follows. For cases with two or three options, the user can manually resolve the correct assignment aided by intensity considerations through inspection of interproton distances in the X-ray or NMR structure (intensities are not considered by MAUS). Here, the user can also consider NOEs that have been ignored by MAUS due to high peak overlap or unclear symmetry, as marked on the annotated output peak table. Methyls for which the correct assignment can be confidently determined by the user can be then fixed in the MAUS input, to trigger a next round of automated assignments. We find that this iterative procedure “propagates” constraints through the data graph, leading to resolution of all assignments in a matter of a few hours while requiring minimal intervention by the user, even for systems with highly complex spectra.

Transform-restrained Rosetta

We used the trRosetta protocol, which makes use of the orientation restraints predicted using deep residual neural networks, together with distance restraints from co-evolution information to model structures at a high resolution in the Rosetta force field 25 . We provided as input the sequence of MBP to trRosetta using a web-server available at https://yanglab.nankai.edu.cn/trRosetta/ . An ensemble of five models output by trRosetta was supplied to MAUS together with our 2D and 3D NMR data, to obtain the resonance assignments.

Comparison with publicly available assignment algorithms using NOE peak lists

Protons were added to X-ray structures using the NIH PDB Utility Server ( https://spin.niddk.nih.gov/bax/nmrserver/pdbutil/sa.html ). In all simulations, the symmetrization tolerance was set to 0.15, 0.15, and 0.02 p.p.m. for the 13 C NOE , 13 C indirect , and 1 H direct dimensions, respectively, except Cas9 HNH domain, for which 0.1 p.p.m. was used for 13 C NOE and 13 C indirect . In all simulations, the long mixing time NOE distance threshold was set to 10 Å, whereas the short mixing time NOE distance threshold was optimized for different targets in the 6–8 Å range (see Table  1 ).

The 2D input lists for MAGIC were generated from the NMRFAM-SPARKY-exported 2D 1 H- 13 C HMQC list and the 50 ms 3D C M -C M -H M NOESY list (used to automatically determine geminal pairing). The resulting 2D list was curated for accuracy regarding both methyl residue type and geminal pairing.

The experimental chemical shift file (methyl.exp) was generated from the NMRFAM-SPARKY-exported 2D 1 H- 13 C HMQC list using an in-house script. Predicted chemical shifts (methyl.pre) were determined using CH3Shift. Linewidths in the 300 ms 3D C M -C M -H M NOESY data were determined using NMRFAM-SPARKY or CcpNmr. FLAMEnGO v2.4 was used. A total of 100,000 Monte Carlo rounds were performed for each calculation. The final result was taken as the summary of 10 independent runs.

The experimental chemical shift file (methyl.exp) was generated from the NMRFAM-SPARKY-exported 2D 1 H- 13 C HMQC list using an in-house script. Predicted chemical shifts (methyl.pre) were determined using CH3Shift. The weight of chemical shift to NOE was kept as 0.2 (the default). The final result was taken as the summary of ten independent runs.

The 2D and 3D input peak lists were picked manually in NMRFAM-SPARKY and were used to generate inputs for MethylFLYA 12 . For every target, we carried out 100 independent calculations using three different distance cutoff values (4.5, 5.0, and 5.5 Å) on a single processor to generate expected NOESY peaks from the input structure. Consistent with the parameters used to run all methods, we used 13 C and 1 H chemical shift tolerance values of 0.15 and 0.02 p.p.m., respectively, for all targets with the exception of HNH, where a 13 C tolerance of 0.1 p.p.m. was used. Consensus results from all runs for assignments classified as strong by MethylFLYA (considered reliable) are reported in Table  2 .

Reporting summary

Further information on research design is available in the  Nature Research Reporting Summary linked to this article.

Data availability

Nuclear Magnetic Resonance assignments for IL-2, REC2, HNH, and REC3, used as blind targets, have been deposited in the BMRB under accession numbers 28104, 28105, 28106, and 28110, respectively. The unprocessed and processed NMR data along with the structural ensembles and input parameters used to run MAUS are provided for all targets in the help page of the MAUS website. Other data are available from the corresponding author upon reasonable request.

Code availability

The MAUS server is available at https://methylassignment.chemistry.ucsc.edu .

Rosenzweig, R. & Kay, L. E. Bringing dynamic molecular machines into focus by methyl-TROSY NMR. Annu. Rev. Biochem. 83 , 291–315 (2014).

Article   CAS   Google Scholar  

Tugarinov, V., Hwang, P. M., Ollerenshaw, J. E. & Kay, L. E. Cross-correlated relaxation enhanced 1 H− 13 C NMR spectroscopy of methyl groups in very high molecular weight proteins and protein complexes. J. Am. Chem. Soc. 125 , 10420–10428 (2003).

Ruschak, A. M. & Kay, L. E. Methyl groups as probes of supra-molecular structure, dynamics and function. J. Biomol. NMR 46 , 75–87 (2010).

Sprangers, R. & Kay, L. E. Quantitative dynamics and binding studies of the 20S proteasome by NMR. Nature 445 , 618–622 (2007).

Kay, L. E., Ikura, M., Tschudin, R. & Bax, A. Three-dimensional triple-resonance NMR spectroscopy of isotopically enriched proteins. J. Magn. Reson. 89 , 496–514 (1990).

ADS   CAS   Google Scholar  

Tugarinov, V. & Kay, L. E. Ile, Leu, and Val methyl assignments of the 723-residue malate synthase G using a new labeling strategy and novel NMR methods. J. Am. Chem. Soc. 125 , 13868–13878 (2003).

Rossi, P., Xia, Y., Khanra, N., Veglia, G. & Kalodimos, C. G. 15N and 13C- SOFAST-HMQC editing enhances 3D-NOESY sensitivity in highly deuterated, selectively [1H,13C]-labeled proteins. J. Biomol. NMR 66 , 259–271 (2016).

Mas, G. et al. Structural investigation of a chaperonin in action reveals how nucleotide binding regulates the functional cycle. Sci. Adv. 4 , eaau4196 (2018).

Article   ADS   Google Scholar  

Monneau, Y. R. et al. Automatic methyl assignment in large proteins by the MAGIC algorithm. J. Biomol. NMR 69 , 215–227 (2017).

Xu, Y. & Matthews, S. MAP-XSII: an improved program for the automatic assignment of methyl resonances in large proteins. J. Biomol. NMR 55 , 179–187 (2013).

Chao, F.-A. et al. FLAMEnGO 2.0: an enhanced fuzzy logic algorithm for structure-based assignment of methyl group resonances. J. Magn. Reson. 245 , 17–23 (2014).

Article   ADS   CAS   Google Scholar  

Pritišanac, I., Würz, J. M., Alderson, T. R. & Güntert, P. Automatic structure-based NMR methyl resonance assignment in large proteins. Nat. Commun. 10 , 4922 (2019).

Pritišanac, I. et al. Automatic assignment of methyl-NMR spectra of supramolecular machines using graph theory. J. Am. Chem. Soc. 139 , 9523–9533 (2017).

Article   Google Scholar  

McGregor, J. J. Backtrack search algorithms and the maximal common subgraph problem. Softw. Pract. Exp. 12 , 23–34 (1982).

Cordella, L. P., Foggia, P., Sansone, C. & Vento, M. A (sub)graph isomorphism algorithm for matching large graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26 , 1367–1372 (2004).

Orts, J., Vögeli, B. & Riek, R. Relaxation matrix analysis of spin diffusion for the NMR structure calculation with eNOEs. J. Chem. Theory Comput. 8 , 3483–3492 (2012).

Leaver-Fay, A. et al. In Methods in Enzymology vol. 487, 545–574 (Elsevier, 2011).

Behera, S. P. et al. Nearest-neighbor NMR spectroscopy: categorizing spectral peaks by their adjacent nuclei. Nat. Commun. 11 , 5547 (2020).

Doudna, J. A. & Charpentier, E. The new frontier of genome engineering with CRISPR-Cas9. Science 346 , 1258096 (2014).

Belato, H. B., East, K. W. & Lisi, G. P. 1H, 13C, 15N backbone and side chain resonance assignment of the HNH nuclease from Streptococcus pyogenes CRISPR-Cas9. Biomol. NMR Assign. 13 , 367–370 (2019).

East, K. W. et al. Allosteric motions of the CRISPR–Cas9 HNH nuclease probed by NMR and molecular dynamics. J. Am. Chem. Soc. 142 , 1348–1358 (2020).

Song, Y. et al. High-resolution comparative modeling with RosettaCM. Structure 21 , 1735–1742 (2013).

Marks, D. S., Hopf, T. A. & Sander, C. Protein structure prediction from sequence variation. Nat. Biotechnol. 30 , 1072–1080 (2012).

Tang, Y. et al. Protein structure determination by combining sparse NMR data with evolutionary couplings. Nat. Methods 12 , 751–754 (2015).

Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl Acad. Sci. USA 117 , 1496–1503 (2020).

Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K. & Moult, J. Critical assessment of methods of protein structure prediction (CASP)—round XIII. Proteins Struct. Funct. Bioinformatics 87 , 1011–1020 (2019).

Wei, K. Y. et al. Computational design of closely related proteins that adopt two well-defined but structurally divergent folds. Proc. Natl Acad. Sci. USA 117 , 7208–7215 (2020).

McShan, A. C. et al. Peptide exchange on MHC-I by TAPBPR is driven by a negative allostery release cycle. Nat. Chem. Biol. 14 , 811–820 (2018).

Gardner, K. H. & Kay, L. E. Production and incorporation of 15 N, 13 C, 2 H (1 H-δ1 methyl) isoleucine into proteins for multidimensional NMR studies. J. Am. Chem. Soc. 119 , 7599–7600 (1997).

Chen, J. S. et al. Enhanced proofreading governs CRISPR–Cas9 targeting accuracy. Nature 550 , 407–410 (2017).

D’Astolfo, D. S. et al. Efficient intracellular delivery of native proteins. Cell 161 , 674–690 (2015).

Wright, A. V. et al. Rational design of a split-Cas9 enzyme complex. Proc. Natl Acad. Sci. 112 , 2984–2989 (2015).

De Paula, V. S. et al. Interleukin-2 druggability is modulated by global conformational transitions controlled by a helical capping switch. Proc. Natl Acad. Sci. USA 117 , 7183–7192 (2020).

Delaglio, F. et al. NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR 6 , 277–293 (1995).

Lee, W., Tonelli, M. & Markley, J. L. NMRFAM-SPARKY: enhanced software for biomolecular NMR spectroscopy. Bioinformatics 31 , 1325–1327 (2015).

Vranken, W. F. et al. The CCPN data model for NMR spectroscopy: Development of a software pipeline. Proteins Struct. Funct. Bioinformatics 59 , 687–696 (2005).

Gans, P. et al. Stereospecific isotopic labeling of methyl groups for NMR spectroscopic studies of high-molecular-weight. Proteins Angew. Chem. Int. Ed. 49 , 1958–1962 (2010).

Tyka, M. D. et al. Alternate states of proteins revealed by detailed energy landscape mapping. J. Mol. Biol. 405 , 607–618 (2011).

Soos, M., Nohl, K. & Castelluccia, C. In Theory and Applications of Satisfiability Testing - SAT 2009 (ed. Kullmann, O.) vol. 5584, 244–257 (Springer, Berlin, Heidelberg, 2009).

Davis, M. & Putnam, H. A computing procedure for quantification theory. J. ACM 7 , 201–215 (1960).

Article   MathSciNet   Google Scholar  

Davis, M., Logemann, G. & Loveland, D. A machine program for theorem-proving. Commun. ACM 5 , 394–397 (1962).

Galil, Z. Efficient algorithms for finding maximum matching in graphs. ACM Comput. Surv. CSUR 18 , 23–38 (1986).

Tugarinov, V., Kanelis, V. & Kay, L. E. Isotope labeling strategies for the study of high-molecular-weight proteins by solution NMR spectroscopy. Nat. Protoc. 1 , 749–754 (2006).

Download references

Acknowledgements

We acknowledge Alison Lindberg and the ITS/ADC staff at UCSC for their assistance in setting up the MAUS website, and Ben Sherman for sharing his python implementation of the satisfiability algorithm designed by Dimitris Achlioptas. This research was supported through NIGMS (R35GM125034) and NIAID (R01AI143997) grants to N.G.S., and through a High-End Instrumentation (HIE) Grant (S10OD018455), which funded the 800 MHz NMR spectrometer at UCSC.

Author information

These authors contributed equally: Santrupti Nerli, Viviane S. De Paula.

Authors and Affiliations

Department of Biomolecular Engineering, University of California, Santa Cruz, CA, 95064, USA

Santrupti Nerli

Center for Computational and Genomic Medicine, Department of Pathology and Laboratory Medicine, The Children’s Hospital of Philadelphia and Department of Biochemistry and Biophysics, Perelman School of Medicine, University of Pennsylvania, 3501 Civic Center Boulevard, Philadelphia, PA, 19104, USA

Santrupti Nerli, Viviane S. De Paula, Andrew C. McShan & Nikolaos G. Sgourakis

Núcleo Multidisciplinar de Pesquisa em Biologia, Universidade Federal do Rio de Janeiro, Duque de Caxias, RJ, 25245-390, Brazil

Viviane S. De Paula

You can also search for this author in PubMed   Google Scholar

Contributions

N.G.S. designed the project. N.G.S. and S.N. designed and S.N. implemented the MAUS web-interface. N.G.S., V.S.D.P., and A.C.M. recorded and analyzed NMR experiments. N.G.S., S.N., and V.S.D.P. wrote the paper, with feedback from all authors.

Corresponding author

Correspondence to Nikolaos G. Sgourakis .

Ethics declarations

Competing interests.

N.G.S. is listed as a co-inventor in a US Patent (US10871459B2) related to this work. The other authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, peer review file, reporting summary, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Nerli, S., De Paula, V.S., McShan, A.C. et al. Backbone-independent NMR resonance assignments of methyl probes in large proteins. Nat Commun 12 , 691 (2021). https://doi.org/10.1038/s41467-021-20984-0

Download citation

Received : 02 July 2020

Accepted : 09 December 2020

Published : 29 January 2021

DOI : https://doi.org/10.1038/s41467-021-20984-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Labeling of methyl groups: a streamlined protocol and guidance for the selection of 2h precursors based on molecular weight.

  • Alexandra Locke
  • Kylee Guarino
  • Gordon S. Rule

Journal of Biomolecular NMR (2024)

  • Anthony C. Bishop
  • Glorisé Torres-Montalvo
  • A. Joshua Wand

Nature Communications (2023)

Conformational plasticity of RAS Q61 family of neoepitopes results in distinct features for targeted recognition

  • Andrew C. McShan
  • David Flores-Solis
  • Nikolaos G. Sgourakis

Structural mechanism of tapasin-mediated MHC-I peptide loading in antigen presentation

  • Jiansheng Jiang
  • Daniel K. Taylor
  • Kannan Natarajan

Nature Communications (2022)

TAPBPR employs a ligand-independent docking mechanism to chaperone MR1 molecules

  • Christine A. Devlin

Nature Chemical Biology (2022)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

nmr backbone assignment

Backbone and methyl side-chain resonance assignments of the Fab fragment of adalimumab

  • Open access
  • Published: 26 June 2024

Cite this article

You have full access to this open access article

nmr backbone assignment

  • Muzaddid Sarker 1 , 2 &
  • Yves Aubin 1 , 3  

Adalimumab is a therapeutic monoclonal antibody developed to target human TNF an important mediator of immune-mediated inflammatory diseases such as rheumatoid arthritis, amongst others. The 48 kDa Fab fragment of adalimumab was produced in Escherichia coli using a single chain approach to allow complete isotopic incorporation of deuterium, carbon-13 and nitrogen-15 along with the protonated isoleucine-d, valine and leucine methyl groups. Here we report the near complete resonance assignment of the polypeptide backbone and the methyl groups of isoleucine, leucine and valine residues.

Avoid common mistakes on your manuscript.

Biological context

Brief biology on tnf.

Tumor necrosis factor (TNF) also known as TNF-alpha is a cytokine released from cells as a soluble trimeric protein (sTNF) from its transmembrane form (tmTNF) by TNF-alpha-converting enzyme (TACE). When it is present at low concentrations, it stimulates host defense mechanisms to fight infections that may follow organ injury. However, at high concentrations, it can promote inflammation and organ injury (Tracey et al. 2008 ). Other cytokines, such as interleukin-1and interleukin-17 also have proinflammatory properties. Abnormal or excessive production of TNF can induce a variety of immune-mediated inflammatory diseases such as rheumatoid arthritis (RA), inflammatory bowel disease (IBD) psoriatic arthritis (PsA), psoriasis (PS) ankylosing spondylitis (AS), and non-infectious uveitis (NIU). TNF can both mediate a variety of direct pathogenic effects and induce the production of other mediators of inflammation (interleukins and other cytokines) and tissue destruction. TNF is an important cytokine at the top of the inflammatory cascade as well as part of an intricate pro-inflammatory network of cytokines (Tracey et al. 2008 ).

The first TNF antagonist biologic drug was a monoclonal anti-TNF antibody named infliximab (brand name Remicade®). Many studies (Choy and Panayi 2001 ; Tracey et al. 2008 ; Jang et al. 2021 ) using this mAb confirmed the central role of TNF in the complex biology of the diseases mentioned earlier. Infliximab is a chimeric therapeutic mAb where the antigen binding fragment (Fab) has its variable domains derived from mouse while the constant regions are human. Such chimeric mAbs have a higher tendency to elicit the production of neutralizing antibodies that, with time, will render the drug less or not effective at all. Adalimumab (brand name Humira® (Abbvie), and its biosimilars Amgevita (Amgen); Cyltezo (Boehringer Ingelheim); Abrilada (Pfizer); Imgraldi (Samsung) and more that are in clinical trials) is a human mAb of the immunoglobulin G1 (IgG1) class and was developed to alleviate (mitigate) this problem. Here we present the backbone and side chain methyl group of isoleucine, leucine and valine residues chemical shifts of the single chain Fab fragment of adalimumab.

Methods and experiments

Expression and purification of adalimumab-fab.

The amino acid sequence used for the production of samples of labelled adalimumab-scFab has been described in details previously (Gagné et al. 2023 ). The heavy chain of adalimumab Fab domain and a portion of the hinge region (see (Hodgson et al. 2019 ) for complete amino acid sequence), residues Glu1 to Pro234 where Cys233 has been mutated to Ala233, was linked to residues Asp1 to Cys214 of the light chain via a linker made of five (GGGGS) elements plus SSGLVPRGS. The last residues of the linker contain a thrombin recognition site (LVPRGS). A poly-histidine tag (MGSSHHHHHH HHHHSSGHMLVPRGS) was fused to the amino terminal of this polypeptide. Papain cleavage, normally takes place between His2289 and Thr229, completely removed the fusion tag and the linker while leaving the thrombin recognition (LVPRGS) site at the N-terminal end of the light chain at the expense of great sample loss (80–90%). Therefore, we initially carried out the backbone and side-chain assignment of histag-adalimumab-scFab fragment, then produced a cleaved sample for final resonance assignment with the following sequence, where the extra residues (LVPRGS) were numbered starting at (−5):

figure a

Expression of labeled 2 H– 13 C– 15 N-histag-adalimumab-scFab was carried out exactly as described previously (Gagne et al. 2024 ) with the exception that cells were harvested after 18 h post induction. The sample for resonance assignment contained 145 µM 2 H– 13 C– 15 N-adalimumab-Fab (7 mg/mL) in 20 mM sodium acetate-d3 at pH 5.0 with 5% v/v deuterium oxide for lock frequency purposes in 50 µL and transferred in a 1.7 mm tube. Production of labeled 2 H– 13 C– 15 N– 1 H–methyl-(Ile, Leu, Val)-histag-adalimumab-scFab was conducted as described previously (Gagne et al. 2024 ). The final sample contained 400 µM 1 H–I(d1)LVmethyl-histag– 2 H– 13 C– 15 N-histag-adalimumab-scFab in 300 µL (5 mm Shigemi tube) in the same buffer.

NMR experiments

Data were collected at 40 ℃ (313 K) on Bruker AVANCE NEO 600 MHz (side-chain methyls assignment), AVANCE III-HD 700 MHz (backbone assignment) and AVANCE II 900 MHz NMR spectrometers equipped with 5 mm, 1.7 mm, and 5 mm, respectively, TCI cryogenically cooled triple resonance inverse probeheads fitted with z-axis gradients. Chemical shift resonances were referenced with sodium 2,2-dimethyl-2-silapentane-5-sulfonate (DSS). Data collection for the assignment of the backbone and side chain methyl resonances was carried out exactly as described previously using the same pulse program and parameters (Gagne et al. 2024 ).

Data analysis and resonance assignment and validation

NMR data were processed using nmrPipe (Delaglio et al. 1995 ). Sequential assignment was carried out manually using POKY (Lee et al. 2021 ).

Validation of NMR assignment

The web server of I-PINE ( http://i-pine.nmrfam.wisc.edu/index.html ) (Lee et al. 2019 ) was used initially through the POKY interface to verify and validate the adalimumab-Fab assignments. All peak lists from TROSY-based NMR experiments, namely 2D- 15 N-HSQC, 3D-HNCO, HN(CA)CO, HNCA, HN(CO)CA, HNCACB, and HN(CO)CACB, were used as input files. Analysis of the I-PINE output showed near-complete correctness of the manual assignments and identified a very few new assignments. Upon completion of manual assignments, we tested the new assignment protocol BARASA (Bishop et al. 2023 ) running under NMRBox using the same peak list and assignments as input data used for I-PINE. A total of 3 BARASA rounds were conducted, using 80 concurrent threads with 0.99 convergence p-value, and a stepwise energy drop of −500. Ca, Cb, and CO zero points were all set to 0.20, with a chemical shift energy range of −100 to 100.

Extent of assignments and data deposition

Resonance assignment of backbone atoms.

We compared the 2D– 1 H– 15 N HSQC of histag-adalimumab-scFab with the fully cleaved adalimumab-Fab. The extra resonances belonging to the histag and linker were well resolved from any backbone resonances of the Fab while all backbone resonances of both samples were overlapping with each other, indicating that the assignment of the histag-scFab can be directly used for the Fab. Considering that the 1.7 mm cryoprobe only requires a very small sample size (40 mL), we carried out resonance assignment of the backbone resonances on a sample of 2 H– 13 C– 15 N-adalimumab-Fab. On the other hand, side-chain methyl assignment was carried out using an uncleaved sample. The Fab fragment contained a total of 442 amino acids (47.7 kDa) with the heavy chain having 228 residues (11 prolines) and the light chain having 214 residues (11 prolines).

The two-dimensional TROSY-HSQC spectrum of 2 H– 13 C– 15 N-adalimumab-Fab shows well-dispersed resonances, typical of well-folded Fab (Fig.  1 ). A total of 381 (90.7%) 1 H– 15 N backbone peaks were assigned, with 183 (84.3%) and 198 (97.5%) in the heavy and light chains, respectively. Assigned carbons include 407 (92.1%) 13 C–O, 407 (92.1%) 13 Cα, and 376 (91.7%) non-glycine 13 Cβ. The Fab fragment is composed of four immunoglobulin domains that are each stabilized by one disulfide bond: Cys22-Cys96 (heavy chain, VH), Cys148-Cys204 (heavy chain, CH1), Cys23-Cys88 (light chain, VL), Cys134-Cys194 (light chain, CL), and one bond that links the heavy to the light chain Cys224-Cys214. Analysis of the cysteine Cβ chemical shifts indicates that their values (higher than 35 ppm) is consistent with properly formed disulfide bond. (Schulte et al. 2020 ).

figure 1

Two-dimensional 1 H– 15 N-TROSY-HSQC spectrum of 2 H– 13 C– 15 N-adalimumab-Fab acquired at 900 MHz at 40 ℃. The backbone amide chemical shift assignment of heavy and light chains is indicated in the blue and red spectra, respectively

Resonance assignment of isoleucine delta -1, leucine and valine methyl groups

Using the carbon tocsy versions of experiments, a total of 11 isoleucines (100%), 27 leucines (88%), and 30 valines (81%) were assigned (Fig.  2 ). Only 4 leucines have not been assigned, (Leu4, Leu108, Leu132 of the heavy chain, and Leu46 of the light chain), and 7 valines (Val5, Val37, Val 99, Val129, Val192 heavy chain and Val58 of light chain).

figure 2

Two-dimensional 1 H– 13 C CT-HSQC spectrum of the methyl spectral region of histag-adalimumab-scFab. The spectrum was acquired using a 1 H–I(δ1)LVmethyl– 2 H– 13 C– 15 N-histag-adalimumab-scFab sample, with acquisition at 40 ℃ and 1 GHz, with 28 ms constant-time delay. The methyl chemical shift assignment of isoleucine-δ1, leucine, and valine of heavy and light chains is shown in the blue and red spectra, respectively

Comparison with trastuzumab-scFab and NISTmAb-Fab assignments

Adalimumab, trastuzumab and the NIST-mAb are three monoclonal antibodies of the IgG1 class with light chain kappa with identical amino acid sequences in their constant heavy 1 (C H 1) and constant light domain (C L ). As reported earlier (Gagne et al. 2024 ), we compared the assignment of these domains of trastuzumab-scFab the yNISTmAb-Fab (Solomon et al. 2023 ) that were overlapping to validate our assignments.

Validation of the backbone assignment

I-PINE was first used to validate the manual assignment and to help in identifying non-assigned residues. Out of the total 442 residues of adalimumab Fab, we initially manually assigned 395 residues corresponding to a coverage of 89.4%. I-PINE assigned 401 residues (i.e., only a few extra) and all residues except one of our 395 manual assignments matched the I-PINE results.

We also used BARASA for our manual assignment validation and identification of non-assigned residues. BARASA predicted assignments of 417 residues and we were able to assign additional residues, namely Asn92, Arg93, Ala94 of the light chain and Thr52, Trp53, Asn54, Ser55, Gly56, Ser78, Ser194 of the heavy chain. We also corrected assignments of Val58 of light chain using BARASA. The third round of BARASA confirmed correctness of all assignments.

The resonance assignment of adalimumab will provide a powerful tool for the characterization of Humira® and its biosimilars. In addition, it will facilitate the application of NMR spectroscopy techniques to probe protein dynamics, epitope binding and drug excipients interactions at atomic level.

Data availability

Chemical shifts and Bruker raw data ser files were deposited in the BMRB data bank with entry number 52274.

Abbreviations

Monoclonal antibody.

Single-chain fragment antigen-binding.

Fragment antigen-binding

Heavy chain variable domain.

Light chain variable domain

Bishop AC, Torres-Montalvo G, Kotaru S, Mimun K, Wand AJ (2023) Robust automated backbone triple resonance NMR assignments of proteins using bayesian-based simulated annealing. Nat Commun 14:1556. https://doi.org/10.1038/s41467-023-37219-z

Article   ADS   Google Scholar  

Choy EHS, Panayi GS (2001) Cytokine pathways and joint inflammation in rheumatoid arthritis. N Engl J Med 344:907–916. https://doi.org/10.1056/NEJM200103223441207

Article   Google Scholar  

Delaglio F, Grzesiek S, Vuister GW, Zhu G, Pfeifer J, Bax A (1995) NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J Biomol NMR 6:277–293

Gagné D, Sarker M, Gingras G, Hodgson DJ, Frahm G, Creskey M, Lorbetskie B, Bigelow S, Wang J, Zhang X et al (2023) Strategies for the production of isotopically labelled Fab fragments of therapeutic antibodies in Komagataella phaffii ( Pichia pastoris ) and Escherichia coli for NMR studies. PLoS ONE 18:e0294406. https://doi.org/10.1371/journal.pone.0294406

Gagne D, Aramini JM, Aubin Y (2024) Backbone and methyl side-chain resonance assignments of the single chain Fab fragment of trastuzumab. Biomol NMR Assign. https://doi.org/10.1007/s12104-024-10177-3

Hodgson DJ, Ghasriani H, Aubin Y (2019) Assessment of the higher order structure of Humira®, Remicade®, Avastin®, Rituxan®, Herceptin®, and Enbrel® by 2D-NMR fingerprinting. J Pharm Biomed Anal 163:144–152. https://doi.org/10.1016/j.jpba.2018.09.056

Jang D-I, Lee AH, Shin H-Y, Song H-R, Park J-H, Kang T-B, Lee S-R, Yang S-H (2021) The role of tumor necrosis factor alpha (tnf-α) in autoimmune disease and current TNF-α inhibitors in therapeutics. Int J Mol Sci. https://doi.org/10.3390/ijms22052719

Lee W, Bahrami A, Dashti HT, Eghbalnia HR, Tonelli M, Westler WM, Markley JL (2019) I-PINE web server: an integrative probabilistic NMR assignment system for proteins. J Biomol NMR 73:213–222. https://doi.org/10.1007/s10858-019-00255-3

Lee W, Rahimi M, Lee Y, Chiu A (2021) POKY: a software suite for multidimensional NMR and 3D structure calculation of biomolecules. Bioinformatics 37:3041–3042. https://doi.org/10.1093/bioinformatics/btab180

Schulte L, Mao J, Reitz J, Sreeramulu S, Kudlinzki D, Hodirnau VV, Meier-Credo J, Saxena K, Buhr F, Langer JD et al (2020) Cysteine oxidation and disulfide formation in the ribosomal exit tunnel. Nat Commun 11:5569. https://doi.org/10.1038/s41467-020-19372-x

Solomon TL, Chao K, Gingras G, Aubin Y, O’Dell WB, Marino JP, Brinson RG (2023) Backbone NMR assignment of the yeast expressed Fab fragment of the NISTmAb reference antibody. Biomol NMR Assign 17:75–81. https://doi.org/10.1007/s12104-023-10123-9

Tracey D, Klareskog L, Sasso EH, Salfeld JG, Tak PP (2008) Tumor necrosis factor antagonist mechanisms of action: A comprehensive review. Pharmacol Ther 117:244–279. https://doi.org/10.1016/j.pharmthera.2007.10.001

Download references

Acknowledgements

Access to the 900 MHz NMR spectrometer was provided by the Government of Canada Ultrahigh-Field NMR Collaboration Platform, operated by the National Research Council of Canada with support from Laboratories Canada, and a consortium of other Canadian Government Departments and Universities. We thank Drs. Michael Johnston and Roger Tam for critical reading of the manuscript.

Open access funding provided by Health Canada library. Not applicable.

Author information

Authors and affiliations.

Centre for Oncology, Radiopharmaceuticals and Research, Biologics and Radiotherapeutic Drugs Directorate, Health Canada, 251 Sir Frederick Banting Driveway, Ottawa, ON, K1A 0K9, Canada

Muzaddid Sarker & Yves Aubin

Department of Chemistry, Queen’s University, 90 Bader Lane, Kingston, ON, K7L 3N6, Canada

Muzaddid Sarker

Department of Chemistry, Carleton University, 1125 Colonel By Drive, Ottawa, ON, K1S 5B6, Canada

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization: Yves Aubin; Investigation: Muzaddid Sarker; Formal Analysis: Muzaddid Sarker, Visualization: Muzaddid Sarker, Yves Aubin; Writing-original draft: Muzaddid Sarker, Yves Aubin; Writing-review and editing: Muzaddid Sarker, Yves Aubin; Supervision: Yves Aubin.

Corresponding author

Correspondence to Yves Aubin .

Ethics declarations

Competing interests.

The authors declare that they have no conflict of interest.

Ethical approval

Not applicable.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Sarker, M., Aubin, Y. Backbone and methyl side-chain resonance assignments of the Fab fragment of adalimumab. Biomol NMR Assign (2024). https://doi.org/10.1007/s12104-024-10187-1

Download citation

Received : 26 April 2024

Accepted : 18 June 2024

Published : 26 June 2024

DOI : https://doi.org/10.1007/s12104-024-10187-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Monoclonal antibody
  • NMR spectroscopy
  • Fragment antigen binding
  • Escherichia coli
  • Find a journal
  • Publish with us
  • Track your research

Protein NMR

A practical guide, introduction.

Most books on Protein NMR focus on theoretical aspects and pulse sequences with only little space devoted to resonance assignment and structure calculations. At the same time many software manuals provide detailed information on how to use the software, but assume prior knowledge of the concepts of assignment and structure calculation. This has produced a gap in this area which these webpages aim to bridge by describing the concepts of assignment in detail with the help of many illustrations. Much space and discussion is devoted to practical aspects.

The implementation of protein NMR assignment is described using the program CCPNmr Analysis . This program has been developed by CCPN and actively seeks input from the NMR community. CCPNmr Analysis is based on the detailed and well thought-out CCPN Data Model which has the advantage (a) that it feeds directly into the CCPN Format Converter thus simplifying the import from and export to other programs, and (b) that as more and more NMR-related programs adopt the CCPN Data Model it is likely to take on a key role in NMR data management – in a similar way to CCP4 for protein X-ray crystallography. CCPNmr Analysis is already one of the best assignment programs available while still being developed and provides excellent support via the CCPN Mailing List (a manual is also available). (Although I now work for the CCPN group, these webpages and my recommendation to use this program far predate this!)

Webpages include:

  • description of several resonance assignment strategies
  • simple descriptions and discussions of many multidimensional NMR experiments commonly used in protein NMR
  • basic usage of CCPNmr Analysis (versions 1 and 2 )
  • how to make publication quality figures using CCPNmr Analysis
  • advice for using CCPNmr Analysis with solid-state MAS NMR data
  • tutorial on protein assignment using solid-state MAS NMR data
  • description of isotopic labelling strategies commonly used in protein NMR
  • links to a large number of protein NMR software packages
  • suggested literature for further reading
  • links to other useful NMR webpages

COMMENTS

  1. Triple Resonance Backbone Assignment

    Standard triple resonance backbone assignment of proteins is based on the CBCANNH and CBCA (CO)NNH spectra. The idea is that the CBCANNH correlates each NH group with the Cα and Cβ chemical shifts of its own residue (strongly) and of the residue preceding (weakly). The CBCA (CO)NNH only correlates the NH group to the preceding Cα and Cβ ...

  2. Robust automated backbone triple resonance NMR assignments of ...

    The combination of fast and robust backbone resonance assignments with structure-based methyl resonance assignments 43,44,45,46,47,48 will reduce the resonance assignment barrier considerably and ...

  3. 6.2: Heteronuclear 3D NMR- Resonance Assignment in Proteins

    Figure VI.2.D An example of combined use of HNCACB and CBCA(CO)NH experiments for the backbone NMR resonance assignment in proteins. Cα and Cα labels are color coded: blue for intra-residual signals and green for preceding carbons (Cα-1, Cβ-1). HNCACB contours are color-coded: black for positive signals (Cα) and red for negative ones (Cβ).

  4. Backbone assignment of perdeuterated proteins by solid-state NMR using

    To allow for chemical shift assignment of the backbone atoms, the setup, optimization and acquisition of five 3D NMR experiments (Steps 27, 39, 46, 52 and 63) are explained in this protocol.

  5. PDF CcpNmr Analysis Version 3 Backbone Assignment Tutorial

    5B Open and set the backbone assignment module. enu Assign Backbone. Select the NmrChain: NC:@-. Open the settings (gearboxicon) and select: Match module: HnCANh_1 (or similar 3D SpectrumDisplay) Search module: HnCANh (a diferent 3D SpectrumDisplay) Leave the rest as default and close the settings. 13.

  6. Automated NMR resonance assignments and structure determination using a

    This is typically addressed through first establishing the chemical shift assignments of backbone and sidechain atoms using multiple (6-10) triple-resonance spectra 2, 3, which are then used as ...

  7. Practical aspects of NMR signal assignment in larger and challenging

    Backbone assignment is often considered completed when all signals in an H-N correlation map have been assigned. For small proteins, with little overlap, the first step in defining spin systems consists in peak picking all H/N signals in an HSQC spectrum. ... Many NMR assignment software packages feature automated routines for performing this ...

  8. Automatic Assignment

    AUTOASSIGN. AutoAssign is an artificial intelligence package for automating the analysis of backbone resonance assignments using triple-resonance NMR spectra of proteins. The new AutoAssign distribution automates the assignments of HN, NH, CO, CA, CB, HA, and HB resonances in non-, partially-, and fully-deuterated samples.

  9. Assignment Practice

    Assignment Practice. This section describes how the assignment principles described under Assignment Theory can be but put into practice using the CCCPNmr Analysis software. There are several ways in which triple resonance backbone assignment, in particular, can be approached in CCPNmr Analysis using more or less automated methods.

  10. Automated Resonance Assignment of Proteins Using Heteronuclear 3D NMR

    Heteronuclear 3D NMR has been used to determine protein solution structures for many years. Scalar magnetization transfer through peptide bonds can be observed in triple resonance 3D NMR and thus makes the sequential assignment of a protein backbone more straightforward. In this paper a generic algorithm is proposed to automate protein backbone resonance assignments. The algorithm searches and ...

  11. PDF NMR backbone resonance assignment

    2002) rather than the standard suite of sequential backbone experi-ments. Contact-based assignment turns 'upside down' the standard approach of first deriving a backbone assignment and using it in assigning the NOESY and solving for the structure. It instead analyzes unassigned backbone NOESY data in order to derive the

  12. Time-optimized protein NMR assignment with an integrative ...

    Using the small sets of spectra identified in this study, the NMR measurements, and thus the effort and cost for the NMR assignment of a protein can be reduced several-fold, proteins may be labeled only with 15 N for backbone HN assignment, and larger proteins can likely be assigned, which enlarges the protein family easily accessible by NMR to ...

  13. NMR Backbone Assignment of Large Proteins by Using

    Nuclear magnetic resonance (NMR) is a powerful tool to interrogate protein structure and dynamics residue by residue. However, the prerequisite chemical-shift assignment remains a bottleneck for large proteins due to the fast relaxation and the frequency degeneracy of the 13 C α nuclei. Herein, we present a covariance NMR strategy to assign the backbone chemical shifts by using only HN(CO)CA ...

  14. New Solid-State NMR Method for Protein Backbone Assignment

    A New and Robust Solid-State NMR Method for Protein Backbone Assignment: Band-Selective Homonuclear Cross Polarization. Solid-state nuclear magnetic resonance (NMR) spectroscopy has emerged as a well-established technique due to ongoing developments regarding higher magnetic fields (e.g. Bruker AVANCE 1000 MHz NMR spectrometer), advanced probe ...

  15. Backbone-independent NMR resonance assignments of methyl probes in

    In the conventional approach, backbone assignments are first established using triple-resonance experiments 5. Then, methyl resonances are connected to the backbone using either methyl out-and-back experiments 6 or, more commonly, using 15 N and 13 C edited amide-to-methyl nuclear Overhauser effect (NOE) measurements 7.

  16. NMR Backbone Assignment of VIM-2 and Identification of the Active

    The NMR backbone resonance assignment for the metallo-β-lactamase VIM-2 (84%) is disclosed, providing the basis for rational development of a clinically applicable inhibitor, which will be a long sought tool in fighting antibiotic resistance.

  17. Triple Resonance Backbone Assignment

    The description here assumes that the backbone assignment will be carried out using CBCA(CO)NH and CBCANH spectra. Many of the steps are the same if using HNCA, HN(CO)CA, HNCO and HN(CA)CO spectra, and differences are highlighted in a separate section.. Now comes the actual assignment part which follows the process outlined in the theory pages.First you are shown how to do the assignment ...

  18. Protein NMR Resonance Assignment

    Therefore, the establishment of the sequential assignment procedure was a mile stone for the protein NMR. Backbone amide proton (H N) and α proton (H α) signals were sequentially assigned based on the distance information between H N i and \({\rm H}^{\alpha}_{{\rm i}-1}\), and were aligned on the amino acid sequence of the particular protein ...

  19. Backbone-independent NMR resonance assignments of methyl ...

    Preparation of NMR samples, backbone, and methyl assignments. All proteins were overexpressed in M9 medium in 2 H 2 O containing 2 g l −1 2 H 13 C glucose (Sigma #552151) and 1 g l −1 15 NH 4 Cl.

  20. Home

    Biomolecular NMR Assignments is a dedicated forum for publishing sequence-specific resonance assignments for proteins and nucleic acids. Provides an avenue for depositing these data into a public database at BioMagResBank. Assignment Notes are published in biannual editions in June and December. No page charges or fees for online color images.

  21. Manual Assignment

    The assignment is done using the assignment panel (press a when the mouse is over the peak to be assigned). Select New for the carbon dimension. This will create a new carbon resonance and assign it to this peak. Since it belongs to the same spin system as the N and H resonances, click on Set Same Spin System, so as to add the new carbon ...

  22. Backbone and methyl side-chain resonance assignments of the Fab

    Data were collected at 40 ℃ (313 K) on Bruker AVANCE NEO 600 MHz (side-chain methyls assignment), AVANCE III-HD 700 MHz (backbone assignment) and AVANCE II 900 MHz NMR spectrometers equipped with 5 mm, 1.7 mm, and 5 mm, respectively, TCI cryogenically cooled triple resonance inverse probeheads fitted with z-axis gradients.

  23. Double Resonance Backbone Assignment

    Double Resonance Backbone Assignment. For smaller proteins, it is possible to do the backbone assignment using just 15 N-labelled protein. The spectra used for this are the 15N-NOESY-HSQC and the 15N-TOCSY-HSQC. The 15N-NOESY-HSQC will show for each NH group all 1 H resonances which are within about 5-7Å of the NH hydrogen.

  24. Protein NMR

    Much space and discussion is devoted to practical aspects. The implementation of protein NMR assignment is described using the program CCPNmr Analysis. This program has been developed by CCPN and actively seeks input from the NMR community. CCPNmr Analysis is based on the detailed and well thought-out CCPN Data Model which has the advantage (a ...