Frontend Object API

Mpipi Frontend Object

The Mpipi

Example:

from finches import Mpipi_frontend
mf = Mpipi_frontend()
class finches.frontend.mpipi_frontend.Mpipi_frontend(salt=0.15, dielectric=80.0)[source]

Bases: FinchesFrontend

intermolecular_idr_matrix(seq1, seq2, window_size=31, use_cython=True, use_aliphatic_weighting=True, use_charge_weighting=True, disorder_1=None, disorder_2=None, null_shuffle=False)[source]

Returns the interaction matrix for the two sequences. Specifically this involves decomposing the two sequences into window_size fragments and calculating the inter-fragment epsilon values using a sliding window approach.

Note that we don’t pad the sequence here, so the edges of the matrix start and end at indices that depend on the window size. To avoid confusion, the function also returns the indices for sequence1 and sequence2.

If sequence 1 or sequence 2 contain ‘U’, then the disorder profile is not generated for that sequence.

Parameters:
  • seq1 (str) – Input sequence 1

  • seq2 (str) – Input sequence 2

  • window_size (int) – The window size to use for the interaction matrix calculation. Default is 31.

  • use_cython (bool) – Whether to use the cython implementation of the interaction matrix calculation. Default is True.

  • use_aliphatic_weighting (bool) – Whether to use the aliphatic weighting scheme for the interaction matrix calculation. This weights local aliphatic residues based on the number of aliphatic residues adjacent to them. Default is True.

  • use_charge_weighting (bool) – Whether to use the charge weighting scheme for the interaction matrix. This weights local charged residues based on the number of charged residues adjacent to them. Default is True.

  • disorder_1 (bool) – Whether to generate the disorder profile for sequence 1. Default is True. If False, a uniform disorder profile is used (all values=1).

  • disorder_2 (bool) – Whether to generate the disorder profile for sequence 2. Default is True. If False, a uniform disorder profile is used (all values=1).

  • null_shuffle (bool) – Whether to shuffle the sequence before calculating the interaction matrix. Default is False. If set to a number defines the number of shuffles used for each sequence; recommended to use 100 shuffles.

Returns:

A tuple containing the interaction matrix, disorder profile for sequence 1, and disorder profile for sequence 2.

[0] : This is interaction matrix, and is itself a tuple of 3 elements. The first is the matrix of sliding epsilon values, and the second and 3rd are the indices that map sequence position from sequence1 and sequence2 to the matrix

[1] disorder profile for sequence 1. Will be all 1s if disorder_1 is False

[2] disorder profile for sequence 2. Will be all 1s if disorder_2 is False

Return type:

tuple

interaction_figure(seq1, seq2, window_size=31, use_cython=True, use_aliphatic_weighting=True, use_charge_weighting=True, tic_frequency=100, seq1_domains=[], seq2_domains=[], seq1_lines=[], seq2_lines=[], linewidth=1, vmin=-3, vmax=3, cmap='PRGn', fname=None, zero_folded=True, no_disorder=False, null_shuffle=False, plot_rectangles=None)[source]

Function to generate an interaction matrix figure between two sequences. This does all the calculation on the backend and formats a figure with parallel disorder tracks alongside the interaction matrix.

If sequence 1 or sequence 2 contain ‘U’, then the disorder profile is not generated for that sequence.

Parameters:
  • seq1 (str) – Input sequence 1

  • seq2 (str) – Input sequence 2

  • window_size (int) – Size of the window to use for the interaction matrix calculation. Note this must be an odd number and will be converted to an odd number if it is not. Default is 31.

  • use_cython (bool) – Whether to use the cython implementation of the interaction matrix (always use this if you can). Default is True.

  • use_aliphatic_weighting (bool) – Whether to use the aliphatic weighting scheme for the interaction matrix calculation. This weights local aliphatic residues based on the number of aliphatic residues adjacent to them. Default is True.

  • use_charge_weighting (bool) – Whether to use the charge weighting scheme for the interaction matrix. This weights local charged residues based on the number of charged residues adjacent to them. Default is True.

  • tic_frequency (int) – Frequency of the TICs on the plot. Default is 100.

  • seq1_domains (list) – List of tuples/lists containing the start and end positions of domains in sequence 1. This means these can be easily highlighted in the plot.

  • seq2_domains (list) – List of tuples/lists containing the start and end positions of domains in sequence 2. This means these can be easily highlighted in the plot.

  • seq1_lines (list) – List of values that will draw lines onto the plot along sequence 1.

  • seq2_lines (list) – List of values that will draw lines onto the plot along sequence 1.

  • vmin (float) – Minimum value for the interaction matrix color scale. Default is -3.

  • vmax (float) – Maximum value for the interaction matrix color scale. Default is 3.

  • cmap (str) – Colormap to use for the interaction matrix. Default is ‘PRGn’.

  • fname (str) – Filename to save the figure to. If None, the figure will be displayed

  • disorder_1 (bool) – Whether to include the disorder profile for sequence 1. Default is True.

  • disorder_2 (bool) – Whether to include the disorder profile for sequence 2. Default is True.

  • no_disorder (bool) – Whether to include the disorder profiles. Default is False. If True, the disorder profiles will not be included.

  • null_shuffle (bool) – Whether to shuffle the sequence before calculating the interaction matrix. Default is False. If set to a number defines the number of shuffles used for each sequence; recommended to use 100 shuffles.

  • plot_rectangles (list) – If a list is provided it should be a list of lists, where each sublist has the folowing information [seq1_start, seq1_end, seq2_start, seq2_end, color, alpha, kwargs]. Based on this information, rectangles will be drawn on the plot to highlight specific regions. Default is None.

Returns:

  • A tuple containing the figure and the axes objects for the main plot, the top

  • disorder plot, the right disorder plot and the colorbar. – fig : matplotlib.figure.Figure (from plt.figure()

    im : matplotlib.image.AxesImage (from plt.imshow())

    ax_main : matplotlib.axes.Axes (from plt.subplot2grid()

    ax_top : matplotlib.axes.Axes (from plt.subplot2grid()

    ax_right : matplotlib.axes.Axes (from plt.subplot2grid()

    ax_colorbar : matplotlib.axes.Axes (from plt.subplot2grid()

build_phase_diagram(seq, use_aliphatic_weighting=True, use_charge_weighting=True)

Function to build a homotypic phase diagram for a given sequence. This is done by calculating the overall epsilon for the sequence, and then combining this with closed-form expressions for the binodal and spinodal lines.

Parameters:
  • seq (str) – The protein sequence

  • use_aliphatic_weighting (bool) – Whether to use the aliphatic weighting scheme. Default is True.

  • use_charge_weighting (bool) – Whether to use the charge weighting scheme. Default is True.

Returns:

  • [0] - Dilute phase concentrations (array of len=N) in Phi

  • [1] - Dense phase concentrations (array of len=N) in Phi

  • [2] - List with [0]: critical Phi and [1]: Critical T

  • [3] - List of temperatures that match with the dense and dilute phase concentrations

  • [4] - Dilute phase concentrations (array of len=N) in Phi for spinodal

  • [5] - Dense phase concentrations (array of len=N) in Phi for spinodal

  • [6] - List with [0]: critical Phi and [1]: Critical T for spinodal

  • [7] - List of temperatures that match with the dense and dilute phase concentrations for spinodal

Return type:

tuple of np.arrays

Examples

This seems somewhat overwhelming, but to plot the resulting binodal we just need to do:

# assuming mf = is a frontend object
B = mf.build_phase_diagram(seq)

# binodal low arm
plt.plot(B[0], B[3], 'blue', label='sequence name')

# binodal high arm
plt.plot(B[1], B[3], 'blue')

plt.legend()
epsilon(seq1, seq2, use_aliphatic_weighting=True, use_charge_weighting=True)

Returns the epilson value associated with the two sequences.

Parameters:
  • seq1 (str) – Input sequence 1

  • seq2 (str) – Input sequence 2

  • use_aliphatic_weighting (bool) – Whether to use the aliphatic weighting scheme for the interaction matrix calculation. This weights local aliphatic residues based on the number of aliphatic residues adjacent to them. Default is True.

  • use_charge_weighting (bool) – Whether to use the charge weighting scheme for the interaction matrix. This weights local charged residues based on the number of charged residues adjacent to them. Default is True.

Returns:

The epsilon value for the two sequences.

Return type:

float

epsilon_vectors(seq1, seq2, use_aliphatic_weighting=True, use_charge_weighting=True)

Returns the attractive and repulsive vectors associated with the interaction between the two sequences.

Parameters:
  • seq1 (str) – Input sequence 1

  • seq2 (str) – Input sequence 2

  • use_aliphatic_weighting (bool) – Whether to use the aliphatic weighting scheme for the interaction matrix calculation. This weights local aliphatic residues based on the number of aliphatic residues adjacent to them. Default is True.

  • use_charge_weighting (bool) – Whether to use the charge weighting scheme for the interaction matrix. This weights local charged residues based on the number of charged residues adjacent to them. Default is True.

Returns:

The epsilon value for the two sequences.

Return type:

float

per_residue_attractive_vector(seq1, seq2, window_size=31, use_cython=True, use_aliphatic_weighting=True, use_charge_weighting=True, return_total=False, attractive_threshold=0, smoothing_window=20, poly_order=3)

Function to calculate the per-residue attractive vector for a given pair of sequences. This is calculated as the sum of the attractive interactions for each residue in the first sequence with all residues in the second sequence. Specifically, this is an average over all attractive values (i.e. where value < 0) using the inter-sequence matrix.

If return_total is True, the function will return the total sum of attractive interactions between the two sequences instead of the average.

This is (potentially) interesting inasmuch as if we just tak the AVERAGE of a region it may be very attractive in some place but repulsive elsewhere, however, repulsive regions in an IDR can avoid each other while attractive things attract, so this allows you to identify the putative ‘sticker’ regions without confounding by repulsive regions.

Parameters:
  • seq1 (str) – The first sequence

  • seq2 (str) – The second sequence

  • window_size (int) – The window size for the intermolecular matrix. Default is 31.

  • use_cython (bool) – Whether to use the cython implementation of the intermolecular matrix calculation. Default is True.

  • use_aliphatic_weighting (bool) – Whether to use the aliphatic weighting scheme. Default is True.

  • use_charge_weighting (bool) – Whether to use the charge weighting scheme. Default is True.

  • return_total (bool) – If True, return the total sum of attractive interactions between the two sequences. Sometimes you may want this.

  • attractive_threshold (float) – The threshold for what is considered attractive. Default is 0 (i.e. only negative values are considered attractive). If changed anything less than this value will be considered attractive.

  • smoothing_window (int) – The window size for the Savgol filter. This is used to smooth the per-residue attractive vector. If set to False no smoothing is applied. Default is 20.

  • poly_order (int) – The polynomial order for the Savgol filter. This is used to smooth the per-residue attractive vector. If set to False no smoothing is applied. Default is 3.

Returns:

[0]np.array

Indices of the residues in the first sequence

[1]np.array

The per-residue attractive vector

Return type:

tuple of np.arrays

per_residue_repulsive_vector(seq1, seq2, window_size=31, use_cython=True, use_aliphatic_weighting=True, use_charge_weighting=True, return_total=False, repulsive_threshold=0, smoothing_window=20, poly_order=3)

Function to calculate the per-residue vector for a given pair of sequences. This is calculated as the sum of the repulsive interactions for each residue in the first sequence with all residues in the second sequence. Specifically, this is an average over all attractive values (i.e. where value < 0) using the inter-sequence matrix.

If return_total is True, the function will return the total sum of attractive interactions between the two sequences instead of the average.

This is (potentially) interesting inasmuch as if we just tak the AVERAGE of a region it may be very attractive in some place but repulsive elsewhere, however, repulsive regions in an IDR can avoid each other while attractive things attract, so this allows you to identify the putative ‘sticker’ regions without confounding by repulsive regions.

Parameters:
  • seq1 (str) – The first sequence

  • seq2 (str) – The second sequence

  • window_size (int) – The window size for the intermolecular matrix. Default is 31.

  • use_cython (bool) – Whether to use the cython implementation of the intermolecular matrix calculation. Default is True.

  • use_aliphatic_weighting (bool) – Whether to use the aliphatic weighting scheme. Default is True.

  • use_charge_weighting (bool) – Whether to use the charge weighting scheme. Default is True.

  • return_total (bool) – If True, return the total sum of attractive interactions between the two sequences. Sometimes you may want this.

  • repulsive_threshold (float) – The threshold for what is considered repulsive. Default is 0 (i.e. only positive values are considered repulsive). If changed, anything above this value will be considered repulsive.

  • smoothing_window (int) – The window size for the Savgol filter. This is used to smooth the per-residue attractive vector. If set to False no smoothing is applied. Default is 20.

  • poly_order (int) – The polynomial order for the Savgol filter. This is used to smooth the per-residue attractive vector. If set to False no smoothing is applied. Default is 3.

Returns:

[0]np.array

Indices of the residues in the first sequence

[1]np.array

The per-residue attractive vector

Return type:

tuple of np.arrays

plot_multiple_phase_diagrams(seq_dict, use_aliphatic_weighting=True, use_charge_weighting=True, tc_ref=None, line_style='-', line_width=0.5, xlim=None, ylim=None, xlog=False, width=2.2, height=1.2, filename=None)

Function to plot the phase diagram for a given sequence. This is done by calculating the overall epsilon for the sequence, and then combining this with closed-form expressions for the binodal.

Parameters:
  • seq_dict (dict) – Dictionary where keys are sequence names and values are a 2-position list, where the first element is the sequence and the second element the color to plot.

  • use_aliphatic_weighting (bool) – Whether to use the aliphatic weighting scheme. Default is True.

  • use_charge_weighting (bool) – Whether to use the charge weighting scheme. Default is True.

  • xlim (tuple) – The x-axis limits. Default is None meaning it is determined automatically. if provided must be a 2-position tuple e.g. [xmin, xmax].

  • ylim (tuple) – The y-axis limits. Default is None meaning it is determined automatically. if provided must be a 2-position tuple e.g. [ymin, ymax].

  • xlog (bool) – Whether to plot the x-axis in log scale. Default is False.

  • width (float) – The width of the figure in inches. Default is 1.2.

  • height (float) – The height of the figure in inches. Default is 2.2.

  • filename (str) – The filename to save the figure. Default is None, meaning the figure is not saved. If provided, the filename must include the extension.

Returns:

[0] - The complex tuple associated with the phase diagram (see the function

signature of build_phase_diagram for more details)

[1] - Figure object

[2] - Axes object

Return type:

Tuple

plot_phase_diagram(seq, use_aliphatic_weighting=True, use_charge_weighting=True, line_color='k', line_style='-', line_width=0.5, xlim=None, ylim=None, xlog=False, width=2.2, height=1.2, filename=None)

Function to plot the phase diagram for a given sequence. This is done by calculating the overall epsilon for the sequence, and then combining this with closed-form expressions for the binodal.

Parameters:
  • seq (str) – The protein sequence

  • use_aliphatic_weighting (bool) – Whether to use the aliphatic weighting scheme. Default is True.

  • use_charge_weighting (bool) – Whether to use the charge weighting scheme. Default is True.

  • xlim (tuple) – The x-axis limits. Default is None meaning it is determined automatically. if provided must be a 2-position tuple e.g. [xmin, xmax].

  • ylim (tuple) – The y-axis limits. Default is None meaning it is determined automatically. if provided must be a 2-position tuple e.g. [ymin, ymax].

  • xlog (bool) – Whether to plot the x-axis in log scale. Default is False.

  • width (float) – The width of the figure in inches. Default is 1.2.

  • height (float) – The height of the figure in inches. Default is 2.2.

  • filename (str) – The filename to save the figure. Default is None, meaning the figure is not saved. If provided, the filename must include the extension.

Returns:

[0] - The complex tuple associated with the phase diagram (see the function

signature of build_phase_diagram for more details)

[1] - Figure object

[2] - Axes object

Return type:

Tuple

plot_protein_nucleic_vector(seq, fragsize=21, smoothing_window=30, poly_order=3, domains=[], domain_color='yellow', domain_alpha=0.3, trace_width=3, vmin=-0.8, vmax=0.8, tic_frequency=100, cmap='PRGn', fname=None, zero_folded=True, show_grid=False, figsize=(4, 1.5), ylim=[-1.2, 1.2])

Function to plot the per-residue attractive vector for a given protein sequence. This is calculated as the sliding-window average of a fragsize region of the protein with a fragsize region of poly-U RNA.

Parameters:
  • seq (str) – The protein sequence

  • fragsize (int) – The size of the sliding window. Must be an odd number. Default is 21.

  • smoothing_window (int) – The window size for the Savgol filter. This is used to smooth the per-residue attractive vector. If set to False no smoothing is applied.

  • poly_order (int) – The polynomial order for the Savgol filter. This is used to smooth the per-residue attractive vector. If set to False no smoothing is applied.

  • domains (list of tuples) – A list of tuples containing the start and end indices of folded domains in the protein sequence. These will be shaded in the plot.

  • domain_color (str) – The color to use for the passed domains. Default is ‘yellow’.

  • domain_alpha (float) – The alpha value for the passed domains. Default is 0.3.

  • trace_width (float) – The width of the line for the per-residue attractive vector. Default is 3.

  • vmin (float) – The minimum value for the color scale. Default is -0.8.

  • vmax (float) – The maximum value for the color scale. Default is 0.8.

  • cmap (str) – The colormap to use for the plot. Default is ‘PRGn’.

  • fname (str) – The filename to save the plot to. If set to None, the plot will be displayed in the console. Default is None.

  • zero_folded (bool) – If True, the folded domains will be shaded in the plot. Default is True.

  • show_grid (bool) – If True, a grid will be displayed on the plot. Default is False.

  • figsize (tuple) – The size of the figure. Default is (4, 1.5).

  • ylim (list) – The y-axis limits for the plot. Default is [-1.2,1.2].

Returns:

fig, ax – will be displayed in the console. If fname is set to a filename, the plot will be saved to that file.

Return type:

matplotlib figure and axis objects. If fname is set to None, the plot

protein_nucleic_vector(seq, fragsize=21, smoothing_window=30, poly_order=3)

Function to calculate the per-residue attractive vector for a given protein sequence. This is calculated as the sliding-window average of a fragsize region of the protein with a fragsize region of poly-U RNA.

The two-position return vector returns the indices of the residues in the protein sequence and the per-residue attractive vector. Note that indices START at fragsize-1/2 and END at len(seq)-fragsize+1/2. This is because the sliding window is centered on each residue.

Parameters:
  • seq (str) – The protein sequence

  • fragsize (int) – The size of the sliding window. Must be an odd number. Default is 31.

  • smoothing_window (int) – The window size for the Savgol filter. This is used to smooth the per-residue attractive vector. If set to False no smoothing is applied.

  • poly_order (int) – The polynomial order for the Savgol filter. This is used to smooth the per-residue attractive vector. If set to False no smoothing is applied.

Returns:

[0]np.array

Indices of the residues in the protein sequence

[1]np.array

The per-residue attractive vector

Return type:

tuple of np.arrays

Mpipi Frontend Object

class finches.frontend.calvados_frontend.CALVADOS_frontend(salt=0.15, pH=7.4, temp=288)[source]

Bases: FinchesFrontend

intermolecular_idr_matrix(seq1, seq2, window_size=31, use_cython=True, use_aliphatic_weighting=True, use_charge_weighting=True, disorder_1=True, disorder_2=True, null_shuffle=False)[source]

Returns the interaction matrix for the two sequences. Specifically this involves decomposing the two sequences into window_size fragments and calculating the inter-fragment epsilon values using a sliding window approach.

Note that we don’t pad the sequence here, so the edges of the matrix start and end at indices that depend on the window size. To avoid confusion, the function also returns the indices for sequence1 and sequence2.

Parameters:
  • seq1 (str) – Input sequence 1

  • seq2 (str) – Input sequence 2

  • window_size (int) – The window size to use for the interaction matrix calculation. Default is 31.

  • use_cython (bool) – Whether to use the cython implementation of the interaction matrix calculation. Default is True.

  • use_aliphatic_weighting (bool) – Whether to use the aliphatic weighting scheme for the interaction matrix calculation. This weights local aliphatic residues based on the number of aliphatic residues adjacent to them. Default is True.

  • use_charge_weighting (bool) – Whether to use the charge weighting scheme for the interaction matrix. This weights local charged residues based on the number of charged residues adjacent to them. Default is True.

  • disorder_1 (bool) – Whether to generate the disorder profile for sequence 1. Default is True. If False, a uniform disorder profile is used (all values=1).

  • disorder_2 (bool) – Whether to generate the disorder profile for sequence 2. Default is True. If False, a uniform disorder profile is used (all values=1).

  • null_shuffle (bool) – Whether to shuffle the sequence before calculating the interaction matrix. Default is False. If set to a number defines the number of shuffles used for each sequence; recommended to use 100 shuffles.

Returns:

A tuple containing the interaction matrix, disorder profile for sequence 1, and disorder profile for sequence 2.

[0] : This is interaction matrix, and is itself a tuple of 3 elements. The first is the matrix of sliding epsilon values, and the second and 3rd are the indices that map sequence position from sequence1 and sequence2 to the matrix

[1] disorder profile for sequence 1. Will be all 1s if disorder_1 is False

[2] disorder profile for sequence 2. Will be all 1s if disorder_2 is False

Return type:

tuple

epsilon(seq1, seq2, use_aliphatic_weighting=True, use_charge_weighting=True)[source]

Returns the epilson value associated with the two sequences. Note that CALVADOS does not currently support RNA.

Parameters:
  • seq1 (str) – Input sequence 1

  • seq2 (str) – Input sequence 2

  • use_aliphatic_weighting (bool) – Whether to use the aliphatic weighting scheme for the interaction matrix calculation. This weights local aliphatic residues based on the number of aliphatic residues adjacent to them. Default is True.

  • use_charge_weighting (bool) – Whether to use the charge weighting scheme for the interaction matrix. This weights local charged residues based on the number of charged residues adjacent to them. Default is True.

Returns:

The epsilon value for the two sequences.

Return type:

float

interaction_figure(seq1, seq2, window_size=31, use_cython=True, use_aliphatic_weighting=True, use_charge_weighting=True, tic_frequency=100, seq1_domains=[], seq2_domains=[], seq1_lines=[], seq2_lines=[], vmin=-7.5, vmax=7.5, cmap='PRGn', fname=None, zero_folded=True, no_disorder=False, null_shuffle=False, plot_rectangles=None)[source]

Function to generate an interaction matrix figure between two sequences. This does all the calculation on the backend and formats a figure with parallel disorder tracks alongside the interaction matrix.

Parameters:
  • seq1 (str) – Input sequence 1

  • seq2 (str) – Input sequence 2

  • window_size (int) – Size of the window to use for the interaction matrix calculation. Note this must be an odd number and will be converted to an odd number if it is not. Default is 31.

  • use_cython (bool) – Whether to use the cython implementation of the interaction matrix (always use this if you can). Default is True.

  • use_aliphatic_weighting (bool) – Whether to use the aliphatic weighting scheme for the interaction matrix calculation. This weights local aliphatic residues based on the number of aliphatic residues adjacent to them. Default is True.

  • use_charge_weighting (bool) – Whether to use the charge weighting scheme for the interaction matrix. This weights local charged residues based on the number of charged residues adjacent to them. Default is True.

  • tic_frequency (int) – Frequency of the TICs on the plot. Default is 100.

  • seq1_domains (list) – List of tuples/lists containing the start and end positions of domains in sequence 1. This means these can be easily highlighted in the plot.

  • seq2_domains (list) – List of tuples/lists containing the start and end positions of domains in sequence 2. This means these can be easily highlighted in the plot.

  • seq1_lines (list) – List of values that will draw lines onto the plot along sequence 1.

  • seq2_lines (list) – List of values that will draw lines onto the plot along sequence 1.

  • vmin (float) – Minimum value for the interaction matrix color scale. Default is -0.75.

  • vmax (float) – Maximum value for the interaction matrix color scale. Default is 0.75.

  • cmap (str) – Colormap to use for the interaction matrix. Default is ‘PRGn’.

  • fname (str) – Filename to save the figure to. If None, the figure will be displayed

  • disorder_1 (bool) – Whether to include the disorder profile for sequence 1. Default is True.

  • disorder_2 (bool) – Whether to include the disorder profile for sequence 2. Default is True.

  • no_disorder (bool) – Whether to include the disorder profiles. Default is False. If True, the disorder profiles will not be included.

  • null_shuffle (bool) – Whether to shuffle the sequence before calculating the interaction matrix. Default is False. If set to a number defines the number of shuffles used for each sequence; recommended to use 100 shuffles.

  • plot_rectangles (list) – If a list is provided it should be a list of lists, where each sublist has the folowing information [seq1_start, seq1_end, seq2_start, seq2_end, color, alpha, kwargs]. Based on this information, rectangles will be drawn on the plot to highlight specific regions. Default is None.

Returns:

  • A tuple containing the figure and the axes objects for the main plot, the top

  • disorder plot, the right disorder plot and the colorbar. – fig : matplotlib.figure.Figure (from plt.figure()

    im : matplotlib.image.AxesImage (from plt.imshow())

    ax_main : matplotlib.axes.Axes (from plt.subplot2grid()

    ax_top : matplotlib.axes.Axes (from plt.subplot2grid()

    ax_right : matplotlib.axes.Axes (from plt.subplot2grid()

    ax_colorbar : matplotlib.axes.Axes (from plt.subplot2grid()

protein_nucleic_vector(fragsize=31, smoothing_window=30, poly_order=3)[source]

Stub function to calculate the protein-nucleic acid interaction vector. CALVADOS does not currently support RNA.

build_phase_diagram(seq, use_aliphatic_weighting=True, use_charge_weighting=True)

Function to build a homotypic phase diagram for a given sequence. This is done by calculating the overall epsilon for the sequence, and then combining this with closed-form expressions for the binodal and spinodal lines.

Parameters:
  • seq (str) – The protein sequence

  • use_aliphatic_weighting (bool) – Whether to use the aliphatic weighting scheme. Default is True.

  • use_charge_weighting (bool) – Whether to use the charge weighting scheme. Default is True.

Returns:

  • [0] - Dilute phase concentrations (array of len=N) in Phi

  • [1] - Dense phase concentrations (array of len=N) in Phi

  • [2] - List with [0]: critical Phi and [1]: Critical T

  • [3] - List of temperatures that match with the dense and dilute phase concentrations

  • [4] - Dilute phase concentrations (array of len=N) in Phi for spinodal

  • [5] - Dense phase concentrations (array of len=N) in Phi for spinodal

  • [6] - List with [0]: critical Phi and [1]: Critical T for spinodal

  • [7] - List of temperatures that match with the dense and dilute phase concentrations for spinodal

Return type:

tuple of np.arrays

Examples

This seems somewhat overwhelming, but to plot the resulting binodal we just need to do:

# assuming mf = is a frontend object
B = mf.build_phase_diagram(seq)

# binodal low arm
plt.plot(B[0], B[3], 'blue', label='sequence name')

# binodal high arm
plt.plot(B[1], B[3], 'blue')

plt.legend()
epsilon_vectors(seq1, seq2, use_aliphatic_weighting=True, use_charge_weighting=True)

Returns the attractive and repulsive vectors associated with the interaction between the two sequences.

Parameters:
  • seq1 (str) – Input sequence 1

  • seq2 (str) – Input sequence 2

  • use_aliphatic_weighting (bool) – Whether to use the aliphatic weighting scheme for the interaction matrix calculation. This weights local aliphatic residues based on the number of aliphatic residues adjacent to them. Default is True.

  • use_charge_weighting (bool) – Whether to use the charge weighting scheme for the interaction matrix. This weights local charged residues based on the number of charged residues adjacent to them. Default is True.

Returns:

The epsilon value for the two sequences.

Return type:

float

per_residue_attractive_vector(seq1, seq2, window_size=31, use_cython=True, use_aliphatic_weighting=True, use_charge_weighting=True, return_total=False, attractive_threshold=0, smoothing_window=20, poly_order=3)

Function to calculate the per-residue attractive vector for a given pair of sequences. This is calculated as the sum of the attractive interactions for each residue in the first sequence with all residues in the second sequence. Specifically, this is an average over all attractive values (i.e. where value < 0) using the inter-sequence matrix.

If return_total is True, the function will return the total sum of attractive interactions between the two sequences instead of the average.

This is (potentially) interesting inasmuch as if we just tak the AVERAGE of a region it may be very attractive in some place but repulsive elsewhere, however, repulsive regions in an IDR can avoid each other while attractive things attract, so this allows you to identify the putative ‘sticker’ regions without confounding by repulsive regions.

Parameters:
  • seq1 (str) – The first sequence

  • seq2 (str) – The second sequence

  • window_size (int) – The window size for the intermolecular matrix. Default is 31.

  • use_cython (bool) – Whether to use the cython implementation of the intermolecular matrix calculation. Default is True.

  • use_aliphatic_weighting (bool) – Whether to use the aliphatic weighting scheme. Default is True.

  • use_charge_weighting (bool) – Whether to use the charge weighting scheme. Default is True.

  • return_total (bool) – If True, return the total sum of attractive interactions between the two sequences. Sometimes you may want this.

  • attractive_threshold (float) – The threshold for what is considered attractive. Default is 0 (i.e. only negative values are considered attractive). If changed anything less than this value will be considered attractive.

  • smoothing_window (int) – The window size for the Savgol filter. This is used to smooth the per-residue attractive vector. If set to False no smoothing is applied. Default is 20.

  • poly_order (int) – The polynomial order for the Savgol filter. This is used to smooth the per-residue attractive vector. If set to False no smoothing is applied. Default is 3.

Returns:

[0]np.array

Indices of the residues in the first sequence

[1]np.array

The per-residue attractive vector

Return type:

tuple of np.arrays

per_residue_repulsive_vector(seq1, seq2, window_size=31, use_cython=True, use_aliphatic_weighting=True, use_charge_weighting=True, return_total=False, repulsive_threshold=0, smoothing_window=20, poly_order=3)

Function to calculate the per-residue vector for a given pair of sequences. This is calculated as the sum of the repulsive interactions for each residue in the first sequence with all residues in the second sequence. Specifically, this is an average over all attractive values (i.e. where value < 0) using the inter-sequence matrix.

If return_total is True, the function will return the total sum of attractive interactions between the two sequences instead of the average.

This is (potentially) interesting inasmuch as if we just tak the AVERAGE of a region it may be very attractive in some place but repulsive elsewhere, however, repulsive regions in an IDR can avoid each other while attractive things attract, so this allows you to identify the putative ‘sticker’ regions without confounding by repulsive regions.

Parameters:
  • seq1 (str) – The first sequence

  • seq2 (str) – The second sequence

  • window_size (int) – The window size for the intermolecular matrix. Default is 31.

  • use_cython (bool) – Whether to use the cython implementation of the intermolecular matrix calculation. Default is True.

  • use_aliphatic_weighting (bool) – Whether to use the aliphatic weighting scheme. Default is True.

  • use_charge_weighting (bool) – Whether to use the charge weighting scheme. Default is True.

  • return_total (bool) – If True, return the total sum of attractive interactions between the two sequences. Sometimes you may want this.

  • repulsive_threshold (float) – The threshold for what is considered repulsive. Default is 0 (i.e. only positive values are considered repulsive). If changed, anything above this value will be considered repulsive.

  • smoothing_window (int) – The window size for the Savgol filter. This is used to smooth the per-residue attractive vector. If set to False no smoothing is applied. Default is 20.

  • poly_order (int) – The polynomial order for the Savgol filter. This is used to smooth the per-residue attractive vector. If set to False no smoothing is applied. Default is 3.

Returns:

[0]np.array

Indices of the residues in the first sequence

[1]np.array

The per-residue attractive vector

Return type:

tuple of np.arrays

plot_multiple_phase_diagrams(seq_dict, use_aliphatic_weighting=True, use_charge_weighting=True, tc_ref=None, line_style='-', line_width=0.5, xlim=None, ylim=None, xlog=False, width=2.2, height=1.2, filename=None)

Function to plot the phase diagram for a given sequence. This is done by calculating the overall epsilon for the sequence, and then combining this with closed-form expressions for the binodal.

Parameters:
  • seq_dict (dict) – Dictionary where keys are sequence names and values are a 2-position list, where the first element is the sequence and the second element the color to plot.

  • use_aliphatic_weighting (bool) – Whether to use the aliphatic weighting scheme. Default is True.

  • use_charge_weighting (bool) – Whether to use the charge weighting scheme. Default is True.

  • xlim (tuple) – The x-axis limits. Default is None meaning it is determined automatically. if provided must be a 2-position tuple e.g. [xmin, xmax].

  • ylim (tuple) – The y-axis limits. Default is None meaning it is determined automatically. if provided must be a 2-position tuple e.g. [ymin, ymax].

  • xlog (bool) – Whether to plot the x-axis in log scale. Default is False.

  • width (float) – The width of the figure in inches. Default is 1.2.

  • height (float) – The height of the figure in inches. Default is 2.2.

  • filename (str) – The filename to save the figure. Default is None, meaning the figure is not saved. If provided, the filename must include the extension.

Returns:

[0] - The complex tuple associated with the phase diagram (see the function

signature of build_phase_diagram for more details)

[1] - Figure object

[2] - Axes object

Return type:

Tuple

plot_phase_diagram(seq, use_aliphatic_weighting=True, use_charge_weighting=True, line_color='k', line_style='-', line_width=0.5, xlim=None, ylim=None, xlog=False, width=2.2, height=1.2, filename=None)

Function to plot the phase diagram for a given sequence. This is done by calculating the overall epsilon for the sequence, and then combining this with closed-form expressions for the binodal.

Parameters:
  • seq (str) – The protein sequence

  • use_aliphatic_weighting (bool) – Whether to use the aliphatic weighting scheme. Default is True.

  • use_charge_weighting (bool) – Whether to use the charge weighting scheme. Default is True.

  • xlim (tuple) – The x-axis limits. Default is None meaning it is determined automatically. if provided must be a 2-position tuple e.g. [xmin, xmax].

  • ylim (tuple) – The y-axis limits. Default is None meaning it is determined automatically. if provided must be a 2-position tuple e.g. [ymin, ymax].

  • xlog (bool) – Whether to plot the x-axis in log scale. Default is False.

  • width (float) – The width of the figure in inches. Default is 1.2.

  • height (float) – The height of the figure in inches. Default is 2.2.

  • filename (str) – The filename to save the figure. Default is None, meaning the figure is not saved. If provided, the filename must include the extension.

Returns:

[0] - The complex tuple associated with the phase diagram (see the function

signature of build_phase_diagram for more details)

[1] - Figure object

[2] - Axes object

Return type:

Tuple

plot_protein_nucleic_vector(seq, fragsize=21, smoothing_window=30, poly_order=3, domains=[], domain_color='yellow', domain_alpha=0.3, trace_width=3, vmin=-0.8, vmax=0.8, tic_frequency=100, cmap='PRGn', fname=None, zero_folded=True, show_grid=False, figsize=(4, 1.5), ylim=[-1.2, 1.2])

Function to plot the per-residue attractive vector for a given protein sequence. This is calculated as the sliding-window average of a fragsize region of the protein with a fragsize region of poly-U RNA.

Parameters:
  • seq (str) – The protein sequence

  • fragsize (int) – The size of the sliding window. Must be an odd number. Default is 21.

  • smoothing_window (int) – The window size for the Savgol filter. This is used to smooth the per-residue attractive vector. If set to False no smoothing is applied.

  • poly_order (int) – The polynomial order for the Savgol filter. This is used to smooth the per-residue attractive vector. If set to False no smoothing is applied.

  • domains (list of tuples) – A list of tuples containing the start and end indices of folded domains in the protein sequence. These will be shaded in the plot.

  • domain_color (str) – The color to use for the passed domains. Default is ‘yellow’.

  • domain_alpha (float) – The alpha value for the passed domains. Default is 0.3.

  • trace_width (float) – The width of the line for the per-residue attractive vector. Default is 3.

  • vmin (float) – The minimum value for the color scale. Default is -0.8.

  • vmax (float) – The maximum value for the color scale. Default is 0.8.

  • cmap (str) – The colormap to use for the plot. Default is ‘PRGn’.

  • fname (str) – The filename to save the plot to. If set to None, the plot will be displayed in the console. Default is None.

  • zero_folded (bool) – If True, the folded domains will be shaded in the plot. Default is True.

  • show_grid (bool) – If True, a grid will be displayed on the plot. Default is False.

  • figsize (tuple) – The size of the figure. Default is (4, 1.5).

  • ylim (list) – The y-axis limits for the plot. Default is [-1.2,1.2].

Returns:

fig, ax – will be displayed in the console. If fname is set to a filename, the plot will be saved to that file.

Return type:

matplotlib figure and axis objects. If fname is set to None, the plot