<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Asif Rahman</title><link>https://asifr.com/notes/</link><description>Recent content by Asif Rahman</description><generator>Hugo</generator><language>en-us</language><copyright>Copyright © 2025, Asif Rahman.</copyright><lastBuildDate>Tue, 02 Dec 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://asifr.com/notes/feed.xml" rel="self" type="application/rss+xml"/><item><title>ChartQL - A token-efficient SQL extension for generating Plotly charts</title><link>https://asifr.com/chartql-token-efficient-chart-query-language/</link><pubDate>Fri, 12 Sep 2025 00:00:00 +0000</pubDate><guid>https://asifr.com/chartql-token-efficient-chart-query-language/</guid><description>
&lt;p>Most LLM-generated chart code is wasteful (&amp;ldquo;token inefficient&amp;rdquo;). When you ask a model to create a Plotly chart, it writes mostly boilerplate with a lot of output tokens for what is fundamentally a simple mapping: take these columns, plot them as this chart type, apply these visual options.&lt;/p>
&lt;p>ChartQL compresses that entire specification into a few lines. The model only needs to produce a SQL query with a &lt;code>PLOT AS&lt;/code> clause, and the execution engine handles the rest. A experiments I found typical Plotly chart specification runs 400-600 tokens. The equivalent ChartQL is 40-80, which is a 10x reduction.&lt;/p>
&lt;p>This is especially important in agentic systems where we want to reduce the number of tool calls. The standard flow would be to write SQL to get the data, then write the chart specification. With ChartQL, a single tool call handles both the query and the chart definition. Fewer tool calls means fewer round trips, lower latency, and less opportunity for the model to introduce errors between steps.&lt;/p>
&lt;p>Plotly has an excellent reference doc and a machine readable specification that describes all the options for layouts and formatting a chart. But with so many options, a full Plotly specification has many ways it can go wrong, mismatched list lengths, incorrect nesting of layout dicts, and wrong trace types. ChartQL&amp;rsquo;s grammar is small enough that a model can reliably produce valid output, and when it doesn&amp;rsquo;t, the parser gives precise error locations in the chart definition, so it&amp;rsquo;s easy to fix.&lt;/p>
&lt;p>ChartQL allows you to embed Plotly chart definitions directly within SQL queries using the &lt;code>PLOT AS&lt;/code> syntax.&lt;/p>
&lt;p>You can customize chart appearance using Plotly layout options using Plotly&amp;rsquo;s &lt;a href="https://plotly.com/python/creating-and-updating-figures/#magic-underscore-notation">magic underscore&lt;/a> notation which makes it
easier to work with nested properties. Supported plot types:&lt;/p>
&lt;ol>
&lt;li>&lt;code>line&lt;/code> - Line charts with markers&lt;/li>
&lt;li>&lt;code>bar&lt;/code> - Bar charts (vertical/horizontal)&lt;/li>
&lt;li>&lt;code>scatter&lt;/code> - Scatter plots&lt;/li>
&lt;li>&lt;code>pie&lt;/code> - Pie charts&lt;/li>
&lt;li>&lt;code>heatmap&lt;/code> - Heat maps&lt;/li>
&lt;li>&lt;code>histogram&lt;/code> - Histogram plots&lt;/li>
&lt;li>&lt;code>box&lt;/code> - Box plots&lt;/li>
&lt;/ol>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-sql" data-lang="sql">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">SELECT&lt;/span> end_date, n_reports &lt;span style="color:#f00">FROM&lt;/span> metrics
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">WHERE&lt;/span> measure = &lt;span style="color:#87ceeb">&amp;#39;active_users&amp;#39;&lt;/span> &lt;span style="color:#f00">AND&lt;/span> adjustment = &lt;span style="color:#87ceeb">&amp;#34;total&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>PLOT &lt;span style="color:#f00">AS&lt;/span> line(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> x=end_date,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> y=n_reports,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> trace_marker_color=&lt;span style="color:#87ceeb">&amp;#39;blue&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> trace_marker_size=&lt;span style="color:#f60">8&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> layout_title_text=&lt;span style="color:#87ceeb">&amp;#39;Active Users Over Time&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> layout_xaxis_type=&lt;span style="color:#87ceeb">&amp;#39;date&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> layout_xaxis_title=&lt;span style="color:#87ceeb">&amp;#39;Date&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> layout_yaxis_title=&lt;span style="color:#87ceeb">&amp;#39;Number of Active Users&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> layout_height=&lt;span style="color:#f60">400&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> layout_width=&lt;span style="color:#f60">700&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> layout_margin_b=&lt;span style="color:#f60">50&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Overall architecture:&lt;/p>
&lt;ul>
&lt;li>sqlparse tokenizes SQL query to identify &lt;code>PLOT AS&lt;/code> clauses.&lt;/li>
&lt;li>The SQL query is executed to fetch data. The parser does not execute the query itself but relies on an execution engine.&lt;/li>
&lt;li>The chart definition is parsed to identify the chart type and its parameters.&lt;/li>
&lt;li>Chart generator returns a Plotly chart specification based on the data and chart type: {data, layout, config}.
&lt;ul>
&lt;li>Anything with layout_ or config_ prefix is treated as a layout or config option&lt;/li>
&lt;li>Column names can be used directly as variables in the chart definition, like line(x=column_name, y=other_column_name)
&lt;ul>
&lt;li>Only in fields which are expected to be data fields, including: x, y, z, color, values, labels, text, etc.&lt;/li>
&lt;li>Multiple columns can be used, e.g. y=[col1, col2, col3] for multiple traces&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Trace specific options can be set using the trace_ prefix, e.g. trace_marker_color, trace_marker_size, etc. will
add the options to each traces marker dict.
&lt;ul>
&lt;li>When multiple columns are used, trace options can be specified as lists to apply to each trace, like
trace_marker_color=[&amp;lsquo;red&amp;rsquo;, &amp;lsquo;blue&amp;rsquo;, &amp;lsquo;green&amp;rsquo;]&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> sqlparse
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> polars &lt;span style="color:#f00">as&lt;/span> pl
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> sqlparse.tokens &lt;span style="color:#f00">import&lt;/span> Text
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> lark &lt;span style="color:#f00">import&lt;/span> Lark, Transformer
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> lark.exceptions &lt;span style="color:#f00">import&lt;/span> UnexpectedToken
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> typing &lt;span style="color:#f00">import&lt;/span> Dict, Any, Tuple, List, Union, Callable
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> ChartQLException(Exception):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Base exception for ChartQL parsing and processing errors.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">pass&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> ChartQLParseError(ChartQLException):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Enhanced parsing error with visual context showing the error location.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__init__&lt;/span>(self, original_error: UnexpectedToken, chart_spec: str):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.original_error = original_error
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.chart_spec = chart_spec
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> super().&lt;span style="color:#ff0">__init__&lt;/span>(self._format_error())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_format_error&lt;/span>(self) -&amp;gt; str:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> lines = self.chart_spec.splitlines()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> error_line_idx = self.original_error.line - &lt;span style="color:#f60">1&lt;/span> &lt;span style="color:#0f0"># Convert to 0-based&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> error_col = self.original_error.column - &lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Get the original error message&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> message = str(self.original_error)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Add visual context if we can locate the line&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> &lt;span style="color:#f60">0&lt;/span> &amp;lt;= error_line_idx &amp;lt; len(lines):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> line_content = lines[error_line_idx]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pointer = &lt;span style="color:#87ceeb">&amp;#34; &amp;#34;&lt;/span> * error_col + &lt;span style="color:#87ceeb">&amp;#34;^&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> message += &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">\n&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>line_content&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">\n&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>pointer&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> message
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">find_plot_as_position&lt;/span>(tokens: List) -&amp;gt; int:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Find the position of PLOT AS clause in token list. Returns -1 if not found.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i in range(len(tokens) - &lt;span style="color:#f60">2&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> tokens[i].value.upper() == &lt;span style="color:#87ceeb">&amp;#34;PLOT&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> and tokens[i + &lt;span style="color:#f60">1&lt;/span>].ttype in (&lt;span style="color:#f00">None&lt;/span>, Text.Whitespace)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> and tokens[i + &lt;span style="color:#f60">2&lt;/span>].value.upper() == &lt;span style="color:#87ceeb">&amp;#34;AS&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> i
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> -&lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">split_chartql_query&lt;/span>(query: str) -&amp;gt; Tuple[str, str]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Split ChartQL query into SQL and chart specification parts.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> tokens = list(sqlparse.parse(query)[&lt;span style="color:#f60">0&lt;/span>].flatten())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> plot_as_pos = find_plot_as_position(tokens)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> plot_as_pos == -&lt;span style="color:#f60">1&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ChartQLException(&lt;span style="color:#87ceeb">&amp;#34;No PLOT AS clause found&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sql_tokens = tokens[:plot_as_pos]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chart_tokens = tokens[plot_as_pos + &lt;span style="color:#f60">3&lt;/span> :]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sql_query = &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&lt;/span>.join(token.value &lt;span style="color:#f00">for&lt;/span> token in sql_tokens).strip()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chart_spec = &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&lt;/span>.join(token.value &lt;span style="color:#f00">for&lt;/span> token in chart_tokens).strip()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> sql_query, chart_spec
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">has_chartql_plot&lt;/span>(query: str) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Check if query contains a PLOT AS clause.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> tokens = list(sqlparse.parse(query)[&lt;span style="color:#f60">0&lt;/span>].flatten())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> find_plot_as_position(tokens) != -&lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>chartql_grammar = &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> start: chart_type &amp;#34;(&amp;#34; parameter_list? &amp;#34;)&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> chart_type: CHART_TYPE
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> parameter_list: parameter (&amp;#34;,&amp;#34; parameter)*
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> parameter: key &amp;#34;=&amp;#34; value
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> key: CNAME
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> value: string | number | boolean | list | column_ref
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> list: &amp;#34;[&amp;#34; value (&amp;#34;,&amp;#34; value)* &amp;#34;]&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> string: SINGLE_STRING | DOUBLE_STRING
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> number: SIGNED_NUMBER
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> boolean: &amp;#34;True&amp;#34; | &amp;#34;False&amp;#34; | &amp;#34;true&amp;#34; | &amp;#34;false&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> column_ref: CNAME
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> CHART_TYPE: &amp;#34;line&amp;#34; | &amp;#34;bar&amp;#34; | &amp;#34;scatter&amp;#34; | &amp;#34;pie&amp;#34; | &amp;#34;heatmap&amp;#34; | &amp;#34;histogram&amp;#34; | &amp;#34;box&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> SINGLE_STRING: /&amp;#39;[^&amp;#39;]*&amp;#39;/
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> DOUBLE_STRING: /&amp;#34;[^&amp;#34;]*&amp;#34;/
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> COMMENT: /--[^&lt;/span>&lt;span style="color:#87ceeb">\\&lt;/span>&lt;span style="color:#87ceeb">n&lt;/span>&lt;span style="color:#87ceeb">\\&lt;/span>&lt;span style="color:#87ceeb">r]*/
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &lt;/span>&lt;span style="color:#87ceeb">%i&lt;/span>&lt;span style="color:#87ceeb">mport common.SIGNED_NUMBER
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &lt;/span>&lt;span style="color:#87ceeb">%i&lt;/span>&lt;span style="color:#87ceeb">mport common.CNAME
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &lt;/span>&lt;span style="color:#87ceeb">%i&lt;/span>&lt;span style="color:#87ceeb">mport common.WS
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &lt;/span>&lt;span style="color:#87ceeb">%i&lt;/span>&lt;span style="color:#87ceeb">gnore WS
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &lt;/span>&lt;span style="color:#87ceeb">%i&lt;/span>&lt;span style="color:#87ceeb">gnore COMMENT
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> ChartQLTransformer(Transformer):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">start&lt;/span>(self, items):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chart_type, params = items[&lt;span style="color:#f60">0&lt;/span>], items[&lt;span style="color:#f60">1&lt;/span>] &lt;span style="color:#f00">if&lt;/span> len(items) &amp;gt; &lt;span style="color:#f60">1&lt;/span> &lt;span style="color:#f00">else&lt;/span> {}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> {&lt;span style="color:#87ceeb">&amp;#34;chart_type&amp;#34;&lt;/span>: chart_type, &lt;span style="color:#87ceeb">&amp;#34;parameters&amp;#34;&lt;/span>: params}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">chart_type&lt;/span>(self, items):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> str(items[&lt;span style="color:#f60">0&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">parameter_list&lt;/span>(self, items):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> dict(items)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">parameter&lt;/span>(self, items):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> (items[&lt;span style="color:#f60">0&lt;/span>], items[&lt;span style="color:#f60">1&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">key&lt;/span>(self, items):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> str(items[&lt;span style="color:#f60">0&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">list&lt;/span>(self, items):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> list(items)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">string&lt;/span>(self, items):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> str(items[&lt;span style="color:#f60">0&lt;/span>])[&lt;span style="color:#f60">1&lt;/span>:-&lt;span style="color:#f60">1&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">number&lt;/span>(self, items):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> val = str(items[&lt;span style="color:#f60">0&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> int(val) &lt;span style="color:#f00">if&lt;/span> &lt;span style="color:#87ceeb">&amp;#34;.&amp;#34;&lt;/span> not in val &lt;span style="color:#f00">else&lt;/span> float(val)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">boolean&lt;/span>(self, items):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not items:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> str(items[&lt;span style="color:#f60">0&lt;/span>]).lower() == &lt;span style="color:#87ceeb">&amp;#34;true&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">column_ref&lt;/span>(self, items):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> str(items[&lt;span style="color:#f60">0&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">value&lt;/span>(self, items):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> items[&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">parse_chartql&lt;/span>(chart_spec: str) -&amp;gt; Dict[str, Any]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Parse chart specification into structured dictionary.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> parser = Lark(chartql_grammar, parser=&lt;span style="color:#87ceeb">&amp;#34;lalr&amp;#34;&lt;/span>, transformer=ChartQLTransformer())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> parser.parse(chart_spec)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> UnexpectedToken &lt;span style="color:#f00">as&lt;/span> e:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ChartQLParseError(e, chart_spec) &lt;span style="color:#f00">from&lt;/span> None
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>DataInput = Union[List[Dict[str, Any]], pl.DataFrame]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> ChartSpecGenerator:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__init__&lt;/span>(self):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.chart_defaults = {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;line&amp;#34;&lt;/span>: {&lt;span style="color:#87ceeb">&amp;#34;type&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;scatter&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;mode&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;lines+markers&amp;#34;&lt;/span>},
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;bar&amp;#34;&lt;/span>: {&lt;span style="color:#87ceeb">&amp;#34;type&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;bar&amp;#34;&lt;/span>},
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;scatter&amp;#34;&lt;/span>: {&lt;span style="color:#87ceeb">&amp;#34;type&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;scatter&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;mode&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;markers&amp;#34;&lt;/span>},
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;pie&amp;#34;&lt;/span>: {&lt;span style="color:#87ceeb">&amp;#34;type&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;pie&amp;#34;&lt;/span>},
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;heatmap&amp;#34;&lt;/span>: {&lt;span style="color:#87ceeb">&amp;#34;type&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;heatmap&amp;#34;&lt;/span>},
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;histogram&amp;#34;&lt;/span>: {&lt;span style="color:#87ceeb">&amp;#34;type&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;histogram&amp;#34;&lt;/span>},
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;box&amp;#34;&lt;/span>: {&lt;span style="color:#87ceeb">&amp;#34;type&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;box&amp;#34;&lt;/span>},
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">generate_spec&lt;/span>(self, parsed_chart: Dict[str, Any], data: DataInput) -&amp;gt; Dict[str, Any]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chart_type = parsed_chart[&lt;span style="color:#87ceeb">&amp;#34;chart_type&amp;#34;&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> parameters = parsed_chart[&lt;span style="color:#87ceeb">&amp;#34;parameters&amp;#34;&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> layout_params = self._extract_prefixed_params(parameters, &lt;span style="color:#87ceeb">&amp;#34;layout_&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> config_params = self._extract_prefixed_params(parameters, &lt;span style="color:#87ceeb">&amp;#34;config_&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> trace_params = self._extract_prefixed_params(parameters, &lt;span style="color:#87ceeb">&amp;#34;trace_&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data_params = self._extract_data_params(parameters)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> traces = self._build_traces(chart_type, data_params, trace_params, data)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> layout = self._build_layout(layout_params)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> config = self._build_config(config_params)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> {&lt;span style="color:#87ceeb">&amp;#34;data&amp;#34;&lt;/span>: traces, &lt;span style="color:#87ceeb">&amp;#34;layout&amp;#34;&lt;/span>: layout, &lt;span style="color:#87ceeb">&amp;#34;config&amp;#34;&lt;/span>: config}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_extract_prefixed_params&lt;/span>(self, parameters: Dict[str, Any], prefix: str) -&amp;gt; Dict[str, Any]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> result = {}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> key, value in parameters.items():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> key.startswith(prefix):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> new_key = key[len(prefix) :]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> result[new_key] = value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> result
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_extract_data_params&lt;/span>(self, parameters: Dict[str, Any]) -&amp;gt; Dict[str, Any]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data_fields = {&lt;span style="color:#87ceeb">&amp;#34;x&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;y&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;z&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;color&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;values&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;labels&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;text&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;size&amp;#34;&lt;/span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> result = {}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> key, value in parameters.items():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> key in data_fields:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> result[key] = value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> result
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_build_traces&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self, chart_type: str, data_params: Dict[str, Any], trace_params: Dict[str, Any], data: DataInput
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ) -&amp;gt; List[Dict[str, Any]]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> base_trace = self.chart_defaults[chart_type].copy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> primary_data_col = self._get_primary_data_column(chart_type)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data_columns = data_params.get(primary_data_col, [])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not isinstance(data_columns, list):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data_columns = [data_columns]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> traces = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i, col in enumerate(data_columns):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> trace = base_trace.copy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._apply_data_mappings(trace, chart_type, data_params, data, i)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._apply_trace_params(trace, trace_params, i)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> trace[&lt;span style="color:#87ceeb">&amp;#34;name&amp;#34;&lt;/span>] = col &lt;span style="color:#f00">if&lt;/span> len(data_columns) &amp;gt; &lt;span style="color:#f60">1&lt;/span> &lt;span style="color:#f00">else&lt;/span> chart_type
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> traces.append(trace)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> traces
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_apply_trace_params&lt;/span>(self, trace: Dict[str, Any], trace_params: Dict[str, Any], trace_index: int):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> key, value in trace_params.items():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(value, list) and len(value) &amp;gt; trace_index:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._set_nested_dict(trace, key, value[trace_index])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> not isinstance(value, list):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._set_nested_dict(trace, key, value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_apply_data_mappings&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self, trace: Dict[str, Any], chart_type: str, data_params: Dict[str, Any], data: DataInput, trace_index: int
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data_field_mappings = {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;line&amp;#34;&lt;/span>: [&lt;span style="color:#87ceeb">&amp;#34;x&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;y&amp;#34;&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;scatter&amp;#34;&lt;/span>: [&lt;span style="color:#87ceeb">&amp;#34;x&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;y&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;size&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;color&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;text&amp;#34;&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;bar&amp;#34;&lt;/span>: [&lt;span style="color:#87ceeb">&amp;#34;x&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;y&amp;#34;&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;pie&amp;#34;&lt;/span>: [&lt;span style="color:#87ceeb">&amp;#34;values&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;labels&amp;#34;&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;heatmap&amp;#34;&lt;/span>: [&lt;span style="color:#87ceeb">&amp;#34;x&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;y&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;z&amp;#34;&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;histogram&amp;#34;&lt;/span>: [&lt;span style="color:#87ceeb">&amp;#34;x&amp;#34;&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;box&amp;#34;&lt;/span>: [&lt;span style="color:#87ceeb">&amp;#34;x&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;y&amp;#34;&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> field in data_field_mappings.get(chart_type, []):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> field in data_params:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> col_ref = data_params[field]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(col_ref, list):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> primary_col = self._get_primary_data_column(chart_type)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> field == primary_col and len(col_ref) &amp;gt; trace_index:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> trace[field] = self._get_column_data(data, col_ref[trace_index])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> trace[field] = self._get_column_data(data, col_ref)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_get_primary_data_column&lt;/span>(self, chart_type: str) -&amp;gt; str:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> primary_cols = {&lt;span style="color:#87ceeb">&amp;#34;line&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;y&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;scatter&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;y&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;bar&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;y&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;pie&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;values&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;histogram&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;x&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;box&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;y&amp;#34;&lt;/span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> primary_cols.get(chart_type, &lt;span style="color:#87ceeb">&amp;#34;y&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_get_column_data&lt;/span>(self, data: DataInput, column: str) -&amp;gt; List[Any]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Extract column data as list from either Polars DataFrame or list of dicts.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> pl and isinstance(data, pl.DataFrame):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> data[column].to_list()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> isinstance(data, list):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> [row.get(column) &lt;span style="color:#f00">for&lt;/span> row in data]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ChartQLException(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Unsupported data type: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>type(data)&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_get_columns&lt;/span>(self, data: DataInput) -&amp;gt; List[str]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Get column names from data.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> pl and isinstance(data, pl.DataFrame):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> data.columns
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> isinstance(data, list) and data:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> list(data[&lt;span style="color:#f60">0&lt;/span>].keys())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_build_layout&lt;/span>(self, layout_params: Dict[str, Any]) -&amp;gt; Dict[str, Any]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> layout = {}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> key, value in layout_params.items():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._set_nested_dict(layout, key, value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> layout
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_build_config&lt;/span>(self, config_params: Dict[str, Any]) -&amp;gt; Dict[str, Any]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> config_params.copy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_set_nested_dict&lt;/span>(self, d: Dict[str, Any], key: str, value: Any):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not key:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keys = key.split(&lt;span style="color:#87ceeb">&amp;#34;_&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not keys:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current = d
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> k in keys[:-&lt;span style="color:#f60">1&lt;/span>]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> k not in current:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current[k] = {}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current = current[k]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current[keys[-&lt;span style="color:#f60">1&lt;/span>]] = value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> DataMapper:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__init__&lt;/span>(self, data: DataInput):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.data = data
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.columns = set(self._get_columns(data))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">resolve_column_references&lt;/span>(self, parameters: Dict[str, Any]) -&amp;gt; Dict[str, Any]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> resolved = {}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> key, value in parameters.items():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> self._is_data_field(key):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> resolved[key] = self._resolve_value(value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> resolved[key] = value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> resolved
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_is_data_field&lt;/span>(self, key: str) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data_fields = {&lt;span style="color:#87ceeb">&amp;#34;x&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;y&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;z&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;color&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;values&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;labels&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;text&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;size&amp;#34;&lt;/span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> key in data_fields
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_resolve_value&lt;/span>(self, value: Any) -&amp;gt; Any:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(value, str) and value in self.columns:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> isinstance(value, list):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> [self._resolve_value(v) &lt;span style="color:#f00">for&lt;/span> v in value]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">validate_columns&lt;/span>(self, parameters: Dict[str, Any]):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> missing_columns = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> key, value in parameters.items():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> self._is_data_field(key):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> columns = self._extract_column_names(value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> col in columns:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> col not in self.columns:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> missing_columns.append(col)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> missing_columns:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ChartQLException(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Columns not found in data: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>missing_columns&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_extract_column_names&lt;/span>(self, value: Any) -&amp;gt; List[str]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(value, str):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> [value] &lt;span style="color:#f00">if&lt;/span> value in self.columns &lt;span style="color:#f00">else&lt;/span> []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> isinstance(value, list):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> columns = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> v in value:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> columns.extend(self._extract_column_names(v))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> columns
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_get_columns&lt;/span>(self, data: DataInput) -&amp;gt; List[str]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Get column names from data.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> pl and isinstance(data, pl.DataFrame):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> data.columns
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> isinstance(data, list) and data:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> list(data[&lt;span style="color:#f60">0&lt;/span>].keys())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">process_chartql_query&lt;/span>(query: str, sql_executor: Callable) -&amp;gt; Dict[str, Any]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Process a complete ChartQL query and return Plotly specification.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sql_query, chart_spec = split_chartql_query(query)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data = sql_executor(sql_query)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> parsed_chart = parse_chartql(chart_spec)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data_mapper = DataMapper(data)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data_mapper.validate_columns(parsed_chart[&lt;span style="color:#87ceeb">&amp;#34;parameters&amp;#34;&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> resolved_params = data_mapper.resolve_column_references(parsed_chart[&lt;span style="color:#87ceeb">&amp;#34;parameters&amp;#34;&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> spec_generator = ChartSpecGenerator()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> plotly_spec = spec_generator.generate_spec(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {&lt;span style="color:#87ceeb">&amp;#34;chart_type&amp;#34;&lt;/span>: parsed_chart[&lt;span style="color:#87ceeb">&amp;#34;chart_type&amp;#34;&lt;/span>], &lt;span style="color:#87ceeb">&amp;#34;parameters&amp;#34;&lt;/span>: resolved_params}, data
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> plotly_spec
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Installing Local Python Packages with uv tool</title><link>https://asifr.com/uv-tool-local-install/</link><pubDate>Mon, 14 Jul 2025 00:00:00 +0000</pubDate><guid>https://asifr.com/uv-tool-local-install/</guid><description>
&lt;p>&lt;a href="https://docs.astral.sh/uv/">Astral&amp;rsquo;s uv&lt;/a> Python package manager has a concept of &lt;a href="https://docs.astral.sh/uv/concepts/tools/">tools&lt;/a>, which are Python packages that provide command-line interfaces. Tools are installed with isolated dependencies. For example, I can install my &lt;a href="https://github.com/asifr/presskit">presskit&lt;/a> package as a tool and it will be available system-wide: &lt;code>uv tool install presskit&lt;/code>.&lt;/p>
&lt;p>During development, I often want to test changes to my package without having to publish it to PyPI. &lt;code>uv tool&lt;/code> allows me to install a local package or a built distribution directly, making it easy to test changes. Here are a few ways to do it:&lt;/p>
&lt;p>Install from local directory&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>uv tool install .
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Install from built distribution&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># First build the package&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>source .venv/bin/activate &amp;amp;&amp;amp; python -m build
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Then install the built wheel&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>uv tool install dist/presskit-*.whl
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Install in editable mode for development&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>uv tool install --editable .
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Install with specific features&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>uv tool install --editable &lt;span style="color:#87ceeb">&amp;#34;.[dev,docs]&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;code>--editable&lt;/code> flag is particularly useful during development as changes to my source code will be reflected immediately without reinstalling.&lt;/p>
&lt;p>Install from git repository&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>uv tool install git+https://github.com/asifr/presskit.git
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Disk-based caching with asset management in Python</title><link>https://asifr.com/disk-cache/</link><pubDate>Tue, 17 Jun 2025 00:00:00 +0000</pubDate><guid>https://asifr.com/disk-cache/</guid><description>
&lt;p>The Python &lt;a href="https://grantjenks.com/docs/diskcache/">DiskCache&lt;/a> library is my go-to solution for disk-based caching. It uses a SQlite database to store small data, and stores larger data in pickle files on disk. It has expiration and eviction policies and a convenient &lt;code>@memoize&lt;/code> decorator for caching function results.&lt;/p>
&lt;p>I use Polars and Numpy for data processing and there are more efficient native formats for storing data, such as Parquet, Arrow, and Numpy. Below is an extension to DiskCache that adds support for these formats using a new &lt;code>@asset&lt;/code> decorator. Now the &lt;code>Cache&lt;/code> class will use duck-typing to determine the type of data being returned by the decorated function and store it in the appropriate format.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>cache = Cache(directory=&lt;span style="color:#87ceeb">&amp;#34;./cache&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@cache.asset()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">filter_weather_stations&lt;/span>(df: pl.DataFrame) -&amp;gt; pl.DataFrame:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> df.filter(pl.col(&lt;span style="color:#87ceeb">&amp;#34;country_code&amp;#34;&lt;/span>) == &lt;span style="color:#87ceeb">&amp;#34;US&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;code>@asset&lt;/code> decorator will automatically detect the return data is a Polars DataFrame and store it in &lt;code>.parquet&lt;/code> format. PyArrow tables are stored as &lt;code>.arrow&lt;/code> as Arrow IPC format. Numpy array will be stored in &lt;code>.npy&lt;/code> format. If the data is a dictionary or list, it will be stored in JSON format. For any other data type, it will fall back to using Pickle. The &lt;code>Cache&lt;/code> class can be extended to support more formats.&lt;/p>
&lt;p>This is a complete-reimplementation of the DiskCache core &lt;code>Cache&lt;/code> class to add the assets decorator, additional bug-fixes and improvements like hardening against edge cases, including race conditions, file name collisions, and thread safety. The &lt;a href="#implementation-code">complete implementation&lt;/a> is at the end of this note.&lt;/p>
&lt;p>Table of Contents:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="#features">Features&lt;/a>&lt;/li>
&lt;li>&lt;a href="#basic-usage">Basic Usage&lt;/a>&lt;/li>
&lt;li>&lt;a href="#expiration-examples">Expiration Examples&lt;/a>&lt;/li>
&lt;li>&lt;a href="#memoization-decorator">Memoization Decorator&lt;/a>&lt;/li>
&lt;li>&lt;a href="#asset-handling-with-native-formats">Asset Handling with Native Formats&lt;/a>&lt;/li>
&lt;li>&lt;a href="#extensible-asset-handlers">Extensible Asset Handlers&lt;/a>&lt;/li>
&lt;li>&lt;a href="#clearing-cache-by-name-or-function">Clearing Cache by Name or Function&lt;/a>&lt;/li>
&lt;li>&lt;a href="#storage-backends">Storage Backends&lt;/a>&lt;/li>
&lt;li>&lt;a href="#eviction-policies">Eviction Policies&lt;/a>&lt;/li>
&lt;li>&lt;a href="#pickle-fallback">Pickle Fallback&lt;/a>&lt;/li>
&lt;li>&lt;a href="#advanced-configuration">Advanced Configuration&lt;/a>&lt;/li>
&lt;li>&lt;a href="#implementation-code">Implementation code&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="features">Features&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Disk and SQLite Storage&lt;/strong>: Hybrid storage using SQLite database for metadata and filesystem for large assets&lt;/li>
&lt;li>&lt;strong>Expiration Support&lt;/strong>: Set TTL (time-to-live) for cached items with automatic cleanup&lt;/li>
&lt;li>&lt;strong>Memoization&lt;/strong>: Function result caching with the @memoize decorator&lt;/li>
&lt;li>&lt;strong>Asset Handling&lt;/strong>: Store data in native formats (JSON, Parquet, Arrow, NumPy, etc.)&lt;/li>
&lt;li>&lt;strong>Extensible Handlers&lt;/strong>: Custom asset format handlers with automatic fallback to pickle&lt;/li>
&lt;li>&lt;strong>Key Management&lt;/strong>: Clear specific keys, function caches, or all items&lt;/li>
&lt;li>&lt;strong>Eviction Policies&lt;/strong>: LRU, LFU, least-recently-stored, or none&lt;/li>
&lt;li>&lt;strong>Transaction Safety&lt;/strong>: Thread and process-safe operations with SQLite WAL mode&lt;/li>
&lt;/ul>
&lt;h2 id="basic-usage">Basic Usage&lt;/h2>
&lt;p>Create a cache instance and store/retrieve data:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> explore.cache &lt;span style="color:#f00">import&lt;/span> Cache
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Create cache with default settings&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cache = Cache()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Store with expiration (3600 seconds = 1 hour)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cache.set(&lt;span style="color:#87ceeb">&amp;#34;user:123&amp;#34;&lt;/span>, {&lt;span style="color:#87ceeb">&amp;#34;name&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;John&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;email&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;john@example.com&amp;#34;&lt;/span>}, expire=&lt;span style="color:#f60">3600&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Retrieve data&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>user_data = cache.get(&lt;span style="color:#87ceeb">&amp;#34;user:123&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>print(user_data) &lt;span style="color:#0f0"># {&amp;#39;name&amp;#39;: &amp;#39;John&amp;#39;, &amp;#39;email&amp;#39;: &amp;#39;john@example.com&amp;#39;}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Check if key exists&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">if&lt;/span> &lt;span style="color:#87ceeb">&amp;#34;user:123&amp;#34;&lt;/span> in cache:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#87ceeb">&amp;#34;User data is cached&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Dictionary-style access&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cache[&lt;span style="color:#87ceeb">&amp;#34;session:abc&amp;#34;&lt;/span>] = &lt;span style="color:#87ceeb">&amp;#34;session_data&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>session = cache[&lt;span style="color:#87ceeb">&amp;#34;session:abc&amp;#34;&lt;/span>]
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="expiration-examples">Expiration Examples&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> time
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Set with 5 second expiration&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cache.set(&lt;span style="color:#87ceeb">&amp;#34;temp_key&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;temporary_value&amp;#34;&lt;/span>, expire=&lt;span style="color:#f60">5&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>print(cache.get(&lt;span style="color:#87ceeb">&amp;#34;temp_key&amp;#34;&lt;/span>)) &lt;span style="color:#0f0"># &amp;#34;temporary_value&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>time.sleep(&lt;span style="color:#f60">6&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>print(cache.get(&lt;span style="color:#87ceeb">&amp;#34;temp_key&amp;#34;&lt;/span>)) &lt;span style="color:#0f0"># None (expired)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Touch to extend expiration&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cache.set(&lt;span style="color:#87ceeb">&amp;#34;extend_key&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;value&amp;#34;&lt;/span>, expire=&lt;span style="color:#f60">10&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cache.touch(&lt;span style="color:#87ceeb">&amp;#34;extend_key&amp;#34;&lt;/span>, expire=&lt;span style="color:#f60">3600&lt;/span>) &lt;span style="color:#0f0"># Extend to 1 hour&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="memoization-decorator">Memoization Decorator&lt;/h2>
&lt;p>Cache expensive function results automatically:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>@cache.memoize(expire=&lt;span style="color:#f60">3600&lt;/span>, typed=&lt;span style="color:#f00">True&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">fibonacci&lt;/span>(n):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> n &amp;lt;= &lt;span style="color:#f60">1&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> n
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> fibonacci(n-&lt;span style="color:#f60">1&lt;/span>) + fibonacci(n-&lt;span style="color:#f60">2&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># First call computes and caches&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>result1 = fibonacci(&lt;span style="color:#f60">100&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Subsequent calls return cached result&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>result2 = fibonacci(&lt;span style="color:#f60">100&lt;/span>) &lt;span style="color:#0f0"># Much faster&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Access cache key for manual operations&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>key = fibonacci.__cache_key__(&lt;span style="color:#f60">100&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cached_value = cache[key]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Clear specific function&amp;#39;s cache&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cache.clear(fibonacci)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="asset-handling-with-native-formats">Asset Handling with Native Formats&lt;/h2>
&lt;p>Store data in native formats instead of pickle:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> pandas &lt;span style="color:#f00">as&lt;/span> pd
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> polars &lt;span style="color:#f00">as&lt;/span> pl
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> numpy &lt;span style="color:#f00">as&lt;/span> np
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> pyarrow &lt;span style="color:#f00">as&lt;/span> pa
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Pandas DataFrame stored as Parquet&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@cache.asset(expire=&lt;span style="color:#f60">3600&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_sales_data&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> pd.DataFrame({&lt;span style="color:#87ceeb">&amp;#39;sales&amp;#39;&lt;/span>: [&lt;span style="color:#f60">100&lt;/span>, &lt;span style="color:#f60">200&lt;/span>, &lt;span style="color:#f60">300&lt;/span>], &lt;span style="color:#87ceeb">&amp;#39;region&amp;#39;&lt;/span>: [&lt;span style="color:#87ceeb">&amp;#39;A&amp;#39;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#39;B&amp;#39;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#39;C&amp;#39;&lt;/span>]})
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>sales = get_sales_data() &lt;span style="color:#0f0"># Stored as .parquet file&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Polars DataFrame stored as Parquet &lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@cache.asset(expire=&lt;span style="color:#f60">3600&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_analytics_data&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> pl.DataFrame({&lt;span style="color:#87ceeb">&amp;#39;metric&amp;#39;&lt;/span>: [&lt;span style="color:#f60">1.1&lt;/span>, &lt;span style="color:#f60">2.2&lt;/span>, &lt;span style="color:#f60">3.3&lt;/span>], &lt;span style="color:#87ceeb">&amp;#39;date&amp;#39;&lt;/span>: [&lt;span style="color:#87ceeb">&amp;#39;2023-01&amp;#39;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#39;2023-02&amp;#39;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#39;2023-03&amp;#39;&lt;/span>]})
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>analytics = get_analytics_data()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># NumPy array stored as .npy file&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@cache.asset(expire=&lt;span style="color:#f60">3600&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_model_weights&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> np.random.random((&lt;span style="color:#f60">1000&lt;/span>, &lt;span style="color:#f60">100&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>weights = get_model_weights()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># PyArrow Table stored as Arrow IPC format&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@cache.asset(expire=&lt;span style="color:#f60">3600&lt;/span>, format=&lt;span style="color:#87ceeb">&amp;#34;arrow&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_arrow_table&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> pa.table({&lt;span style="color:#87ceeb">&amp;#39;col1&amp;#39;&lt;/span>: [&lt;span style="color:#f60">1&lt;/span>, &lt;span style="color:#f60">2&lt;/span>, &lt;span style="color:#f60">3&lt;/span>], &lt;span style="color:#87ceeb">&amp;#39;col2&amp;#39;&lt;/span>: [&lt;span style="color:#87ceeb">&amp;#39;a&amp;#39;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#39;b&amp;#39;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#39;c&amp;#39;&lt;/span>]})
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>table = get_arrow_table()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># JSON for dicts/lists (when JSON-serializable)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@cache.asset(expire=&lt;span style="color:#f60">3600&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_config&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> {&lt;span style="color:#87ceeb">&amp;#34;database&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;postgres&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;port&amp;#34;&lt;/span>: &lt;span style="color:#f60">5432&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;features&amp;#34;&lt;/span>: [&lt;span style="color:#87ceeb">&amp;#34;auth&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;cache&amp;#34;&lt;/span>]}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>config = get_config() &lt;span style="color:#0f0"># Stored as .json file&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Force specific format&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@cache.asset(expire=&lt;span style="color:#f60">3600&lt;/span>, format=&lt;span style="color:#87ceeb">&amp;#34;parquet&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_mixed_data&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Even if data could be JSON, force Parquet format&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> pl.DataFrame({&lt;span style="color:#87ceeb">&amp;#39;values&amp;#39;&lt;/span>: [&lt;span style="color:#f60">1&lt;/span>, &lt;span style="color:#f60">2&lt;/span>, &lt;span style="color:#f60">3&lt;/span>]})
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="extensible-asset-handlers">Extensible Asset Handlers&lt;/h2>
&lt;p>Create custom handlers for new data formats:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> CSVHandler(AssetHandler):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> format_name = &lt;span style="color:#87ceeb">&amp;#34;csv&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">can_handle&lt;/span>(self, data):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> hasattr(data, &lt;span style="color:#87ceeb">&amp;#39;to_csv&amp;#39;&lt;/span>) &lt;span style="color:#0f0"># Pandas-like objects&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">save&lt;/span>(self, data, path):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data.to_csv(path, index=&lt;span style="color:#f00">False&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">load&lt;/span>(self, path):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> pandas &lt;span style="color:#f00">as&lt;/span> pd
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> pd.read_csv(path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">is_available&lt;/span>(self):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> pandas
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">True&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> ImportError:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Register custom handler&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cache.register_asset_handler(CSVHandler())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Now DataFrames can be stored as CSV&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@cache.asset(format=&lt;span style="color:#87ceeb">&amp;#34;csv&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_report&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> pd.DataFrame({&lt;span style="color:#87ceeb">&amp;#39;report&amp;#39;&lt;/span>: [&lt;span style="color:#87ceeb">&amp;#39;Q1&amp;#39;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#39;Q2&amp;#39;&lt;/span>], &lt;span style="color:#87ceeb">&amp;#39;revenue&amp;#39;&lt;/span>: [&lt;span style="color:#f60">100000&lt;/span>, &lt;span style="color:#f60">120000&lt;/span>]})
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="clearing-cache-by-name-or-function">Clearing Cache by Name or Function&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Clear all cache&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cache.clear()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Clear specific key&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cache.clear(&lt;span style="color:#87ceeb">&amp;#34;user:123&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Clear specific function&amp;#39;s cached results&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cache.clear(expensive_function)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Clear multiple items&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cache.clear(&lt;span style="color:#87ceeb">&amp;#34;key1&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;key2&amp;#34;&lt;/span>, my_function)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Clear with list&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cache.clear([&lt;span style="color:#87ceeb">&amp;#34;session:abc&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;temp:xyz&amp;#34;&lt;/span>])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="storage-backends">Storage Backends&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># SQLite database for metadata and small values&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cache = Cache(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> directory=&lt;span style="color:#87ceeb">&amp;#34;/path/to/cache&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dbname=&lt;span style="color:#87ceeb">&amp;#34;my_cache.db&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sqlite_timeout=&lt;span style="color:#f60">60&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> min_file_size=&lt;span style="color:#f60">32768&lt;/span> &lt;span style="color:#0f0"># Store values &amp;lt; 32KB in SQLite, larger as files&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Large files stored on disk with appropriate extensions&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Small values stored as BLOBs in SQLite database&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Automatic file/database decision based on size&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="eviction-policies">Eviction Policies&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Least Recently Used (LRU)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cache = Cache(eviction_policy=&lt;span style="color:#87ceeb">&amp;#34;least-recently-used&amp;#34;&lt;/span>, size_limit=&lt;span style="color:#f60">1024&lt;/span>*&lt;span style="color:#f60">1024&lt;/span>*&lt;span style="color:#f60">1024&lt;/span>) &lt;span style="color:#0f0"># 1GB&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Least Frequently Used (LFU) &lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cache = Cache(eviction_policy=&lt;span style="color:#87ceeb">&amp;#34;least-frequently-used&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Least Recently Stored&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cache = Cache(eviction_policy=&lt;span style="color:#87ceeb">&amp;#34;least-recently-stored&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># No eviction&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cache = Cache(eviction_policy=&lt;span style="color:#87ceeb">&amp;#34;none&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="pickle-fallback">Pickle Fallback&lt;/h2>
&lt;p>When native format handlers fail or aren&amp;rsquo;t available, automatic fallback to pickle:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Custom object without specific handler&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> CustomData:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__init__&lt;/span>(self, value):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.value = value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@cache.asset(expire=&lt;span style="color:#f60">3600&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_custom_object&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> CustomData(&lt;span style="color:#87ceeb">&amp;#34;important_data&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Automatically falls back to pickle storage&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>obj = get_custom_object()
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="advanced-configuration">Advanced Configuration&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>cache = Cache(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> directory=&lt;span style="color:#87ceeb">&amp;#34;/var/cache/myapp&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dbname=&lt;span style="color:#87ceeb">&amp;#34;cache.db&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> eviction_policy=&lt;span style="color:#87ceeb">&amp;#34;least-recently-used&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> size_limit=&lt;span style="color:#f60">2&lt;/span>**&lt;span style="color:#f60">30&lt;/span>, &lt;span style="color:#0f0"># 1GB total cache size&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cull_limit=&lt;span style="color:#f60">100&lt;/span>, &lt;span style="color:#0f0"># Remove 100 items when culling&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> min_file_size=&lt;span style="color:#f60">2&lt;/span>**&lt;span style="color:#f60">15&lt;/span>, &lt;span style="color:#0f0"># 32KB threshold for file vs database storage&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sqlite_cache_size=&lt;span style="color:#f60">8192&lt;/span>, &lt;span style="color:#0f0"># SQLite page cache&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sqlite_mmap_size=&lt;span style="color:#f60">2&lt;/span>**&lt;span style="color:#f60">26&lt;/span>, &lt;span style="color:#0f0"># 64MB memory mapping&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> asset_handlers=[CustomHandler(), JSONHandler(), PickleHandler()]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="implementation-code">Implementation code&lt;/h2>
&lt;p>This is a zero-dependency implementation for disk-based caching with native format support. It includes the &lt;code>Cache&lt;/code> class, &lt;code>AssetHandler&lt;/code> base class, and various asset handlers for JSON, Parquet, Arrow, Numpy, and Pickle.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> os
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> io
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> time
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> errno
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> shutil
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> codecs
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> pickle
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> sqlite3
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> hashlib
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> tempfile
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> threading
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> contextlib
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> typing &lt;span style="color:#f00">as&lt;/span> t
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> pickletools
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> pathlib &lt;span style="color:#f00">import&lt;/span> Path
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> functools &lt;span style="color:#f00">import&lt;/span> partial, wraps
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Type variables for preserving function signatures&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>P = t.ParamSpec(&lt;span style="color:#87ceeb">&amp;#39;P&amp;#39;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>T = t.TypeVar(&lt;span style="color:#87ceeb">&amp;#39;T&amp;#39;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>EVICTION_POLICY = {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;none&amp;#34;&lt;/span>: {&lt;span style="color:#87ceeb">&amp;#34;init&amp;#34;&lt;/span>: &lt;span style="color:#f00">None&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;get&amp;#34;&lt;/span>: &lt;span style="color:#f00">None&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;cull&amp;#34;&lt;/span>: &lt;span style="color:#f00">None&lt;/span>},
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;least-recently-stored&amp;#34;&lt;/span>: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;init&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;CREATE INDEX IF NOT EXISTS Cache_store_time ON Cache (store_time)&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;get&amp;#34;&lt;/span>: &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;cull&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;SELECT &lt;/span>&lt;span style="color:#87ceeb">{fields}&lt;/span>&lt;span style="color:#87ceeb"> FROM Cache ORDER BY store_time LIMIT ?&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;least-recently-used&amp;#34;&lt;/span>: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;init&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;CREATE INDEX IF NOT EXISTS Cache_access_time ON Cache (access_time)&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;get&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;access_time = &lt;/span>&lt;span style="color:#87ceeb">{now}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;cull&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;SELECT &lt;/span>&lt;span style="color:#87ceeb">{fields}&lt;/span>&lt;span style="color:#87ceeb"> FROM Cache ORDER BY access_time LIMIT ?&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;least-frequently-used&amp;#34;&lt;/span>: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;init&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;CREATE INDEX IF NOT EXISTS Cache_access_count ON Cache (access_count)&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;get&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;access_count = access_count + 1&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;cull&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;SELECT &lt;/span>&lt;span style="color:#87ceeb">{fields}&lt;/span>&lt;span style="color:#87ceeb"> FROM Cache ORDER BY access_count LIMIT ?&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>MODE_NONE = &lt;span style="color:#f60">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>MODE_RAW = &lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>MODE_BINARY = &lt;span style="color:#f60">2&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>MODE_TEXT = &lt;span style="color:#f60">3&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>MODE_PICKLE = &lt;span style="color:#f60">4&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> AssetHandler:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Base class for asset format handlers.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Subclasses should override format_name, can_handle, save, and load methods.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> format_name: str = &lt;span style="color:#87ceeb">&amp;#34;base&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">can_handle&lt;/span>(self, data: t.Any) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Check if this handler can process the given data.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> data: The data to check
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> bool: True if this handler can process the data
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> NotImplementedError
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">save&lt;/span>(self, data: t.Any, path: str) -&amp;gt; &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Save data to file.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> data: The data to save
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> path: The file path to save to
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> NotImplementedError
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">load&lt;/span>(self, path: str) -&amp;gt; t.Any:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Load data from file.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> path: The file path to load from
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> The loaded data
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> NotImplementedError
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">is_available&lt;/span>(self) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Check if required dependencies are available.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> bool: True if all dependencies are available
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">True&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> PickleHandler(AssetHandler):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Default pickle handler for any Python object.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> format_name = &lt;span style="color:#87ceeb">&amp;#34;pickle&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">can_handle&lt;/span>(self, data: t.Any) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">True&lt;/span> &lt;span style="color:#0f0"># Can handle anything&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">save&lt;/span>(self, data: t.Any, path: str) -&amp;gt; &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> open(path, &lt;span style="color:#87ceeb">&amp;#34;wb&amp;#34;&lt;/span>) &lt;span style="color:#f00">as&lt;/span> f:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">load&lt;/span>(self, path: str) -&amp;gt; t.Any:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> open(path, &lt;span style="color:#87ceeb">&amp;#34;rb&amp;#34;&lt;/span>) &lt;span style="color:#f00">as&lt;/span> f:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> pickle.load(f)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> JSONHandler(AssetHandler):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;JSON handler for dicts and lists.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> format_name = &lt;span style="color:#87ceeb">&amp;#34;json&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">can_handle&lt;/span>(self, data: t.Any) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not isinstance(data, (dict, list)):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> json
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> json.dumps(data)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">True&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> (ImportError, TypeError, ValueError):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">save&lt;/span>(self, data: t.Any, path: str) -&amp;gt; &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> json
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> open(path, &lt;span style="color:#87ceeb">&amp;#34;w&amp;#34;&lt;/span>, encoding=&lt;span style="color:#87ceeb">&amp;#34;utf-8&amp;#34;&lt;/span>) &lt;span style="color:#f00">as&lt;/span> f:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> json.dump(data, f, default=str, sort_keys=&lt;span style="color:#f00">False&lt;/span>, separators=(&lt;span style="color:#87ceeb">&amp;#34;,&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;: &amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">load&lt;/span>(self, path: str) -&amp;gt; t.Any:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> json
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> open(path, &lt;span style="color:#87ceeb">&amp;#34;r&amp;#34;&lt;/span>, encoding=&lt;span style="color:#87ceeb">&amp;#34;utf-8&amp;#34;&lt;/span>) &lt;span style="color:#f00">as&lt;/span> f:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> json.load(f)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">is_available&lt;/span>(self) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> json &lt;span style="color:#0f0"># noqa: F401&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">True&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> ImportError:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> PolarsHandler(AssetHandler):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Polars DataFrame handler.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> format_name = &lt;span style="color:#87ceeb">&amp;#34;parquet&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">can_handle&lt;/span>(self, data: t.Any) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> hasattr(data, &lt;span style="color:#87ceeb">&amp;#34;__class__&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> and data.&lt;span style="color:#eedd82">__class__&lt;/span>.&lt;span style="color:#eedd82">__module__&lt;/span> == &lt;span style="color:#87ceeb">&amp;#34;polars.dataframe.frame&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> and data.&lt;span style="color:#eedd82">__class__&lt;/span>.&lt;span style="color:#eedd82">__name__&lt;/span> == &lt;span style="color:#87ceeb">&amp;#34;DataFrame&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> Exception:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">save&lt;/span>(self, data: t.Any, path: str) -&amp;gt; &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data.write_parquet(path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">load&lt;/span>(self, path: str) -&amp;gt; t.Any:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> polars &lt;span style="color:#f00">as&lt;/span> pl
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> pl.read_parquet(path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">is_available&lt;/span>(self) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> polars &lt;span style="color:#0f0"># noqa: F401&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">True&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> ImportError:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> PandasHandler(AssetHandler):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Pandas DataFrame handler.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> format_name = &lt;span style="color:#87ceeb">&amp;#34;parquet&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">can_handle&lt;/span>(self, data: t.Any) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> hasattr(data, &lt;span style="color:#87ceeb">&amp;#34;__class__&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> and data.&lt;span style="color:#eedd82">__class__&lt;/span>.&lt;span style="color:#eedd82">__module__&lt;/span> == &lt;span style="color:#87ceeb">&amp;#34;pandas.core.frame&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> and data.&lt;span style="color:#eedd82">__class__&lt;/span>.&lt;span style="color:#eedd82">__name__&lt;/span> == &lt;span style="color:#87ceeb">&amp;#34;DataFrame&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> Exception:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">save&lt;/span>(self, data: t.Any, path: str) -&amp;gt; &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data.to_parquet(path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">load&lt;/span>(self, path: str) -&amp;gt; t.Any:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> pandas &lt;span style="color:#f00">as&lt;/span> pd
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> pd.read_parquet(path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">is_available&lt;/span>(self) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> pandas &lt;span style="color:#0f0"># noqa: F401&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">True&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> ImportError:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> NumpyHandler(AssetHandler):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;NumPy array handler.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> format_name = &lt;span style="color:#87ceeb">&amp;#34;numpy&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">can_handle&lt;/span>(self, data: t.Any) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> hasattr(data, &lt;span style="color:#87ceeb">&amp;#34;__class__&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> and data.&lt;span style="color:#eedd82">__class__&lt;/span>.&lt;span style="color:#eedd82">__module__&lt;/span> == &lt;span style="color:#87ceeb">&amp;#34;numpy&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> and data.&lt;span style="color:#eedd82">__class__&lt;/span>.&lt;span style="color:#eedd82">__name__&lt;/span> == &lt;span style="color:#87ceeb">&amp;#34;ndarray&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> Exception:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">save&lt;/span>(self, data: t.Any, path: str) -&amp;gt; &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> numpy &lt;span style="color:#f00">as&lt;/span> np
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Remove .npy extension if present since np.save adds it automatically&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> path.endswith(&lt;span style="color:#87ceeb">&amp;#39;.npy&amp;#39;&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> path = path[:-&lt;span style="color:#f60">4&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> np.save(path, data)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">load&lt;/span>(self, path: str) -&amp;gt; t.Any:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> numpy &lt;span style="color:#f00">as&lt;/span> np
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># np.save automatically adds .npy, so we need to handle both cases&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not path.endswith(&lt;span style="color:#87ceeb">&amp;#39;.npy&amp;#39;&lt;/span>) and not os.path.exists(path):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> path = path + &lt;span style="color:#87ceeb">&amp;#39;.npy&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> np.load(path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">is_available&lt;/span>(self) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> numpy &lt;span style="color:#0f0"># noqa: F401&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">True&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> ImportError:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> ArrowHandler(AssetHandler):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;PyArrow Table handler.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> format_name = &lt;span style="color:#87ceeb">&amp;#34;arrow&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">can_handle&lt;/span>(self, data: t.Any) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Check for PyArrow Table&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> hasattr(data, &lt;span style="color:#87ceeb">&amp;#34;__class__&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> and data.&lt;span style="color:#eedd82">__class__&lt;/span>.&lt;span style="color:#eedd82">__module__&lt;/span> == &lt;span style="color:#87ceeb">&amp;#34;pyarrow.lib&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> and data.&lt;span style="color:#eedd82">__class__&lt;/span>.&lt;span style="color:#eedd82">__name__&lt;/span> == &lt;span style="color:#87ceeb">&amp;#34;Table&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> Exception:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">save&lt;/span>(self, data: t.Any, path: str) -&amp;gt; &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> pyarrow &lt;span style="color:#f00">as&lt;/span> pa
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Save as Arrow IPC format (Arrow file)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> pa.OSFile(path, &lt;span style="color:#87ceeb">&amp;#34;wb&amp;#34;&lt;/span>) &lt;span style="color:#f00">as&lt;/span> sink:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> pa.ipc.new_file(sink, data.schema) &lt;span style="color:#f00">as&lt;/span> writer:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> writer.write_table(data)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">load&lt;/span>(self, path: str) -&amp;gt; t.Any:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> pyarrow &lt;span style="color:#f00">as&lt;/span> pa
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Read Arrow IPC format file&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> pa.memory_map(path) &lt;span style="color:#f00">as&lt;/span> source:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> pa.ipc.open_file(source).read_all()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">is_available&lt;/span>(self) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> pyarrow &lt;span style="color:#0f0"># noqa: F401&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">True&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> ImportError:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">delete&lt;/span>(path: str | Path) -&amp;gt; &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> os.path.isdir(path):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> shutil.rmtree(path, ignore_errors=&lt;span style="color:#f00">True&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> os.path.exists(path):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> os.remove(path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> Exception:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">pass&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> _Constant(tuple):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Pretty display of immutable constant.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__new__&lt;/span>(cls, name):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> tuple.&lt;span style="color:#ff0">__new__&lt;/span>(cls, (name,))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__repr__&lt;/span>(self):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">%s&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span> % self[&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>ENOVAL = _Constant(&lt;span style="color:#87ceeb">&amp;#34;ENOVAL&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>UNKNOWN = _Constant(&lt;span style="color:#87ceeb">&amp;#34;UNKNOWN&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">sqlite_execute_with_retry&lt;/span>(conn: sqlite3.Connection, statement: str, parameters: t.Iterable = ()) -&amp;gt; sqlite3.Cursor:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Execute a SQL statement with retry on database lock.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Re-try the statement if the error is &amp;#34;database is locked&amp;#34; for up to 60 seconds.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> conn: SQLite connection.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> statement: SQL statement.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> parameters: SQL statement parameters.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> sqlite3.Cursor: a cursor object
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Raises:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> sqlite3.OperationalError: if the error is not &amp;#34;database is locked&amp;#34;.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> TimeoutError: if the database is locked for more than 60 seconds.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> start = time.time()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">while&lt;/span> &lt;span style="color:#f00">True&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> conn.execute(statement, parameters) &lt;span style="color:#0f0"># type: ignore[no-untyped-call]&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> sqlite3.OperationalError &lt;span style="color:#f00">as&lt;/span> exc:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> str(exc) != &lt;span style="color:#87ceeb">&amp;#34;database is locked&amp;#34;&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> &lt;span style="color:#0f0"># re-raise the original exception&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> diff = time.time() - start
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> diff &amp;gt; &lt;span style="color:#f60">60&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> TimeoutError(&lt;span style="color:#87ceeb">&amp;#34;SQLite database is locked for more than 60 seconds&amp;#34;&lt;/span>) &lt;span style="color:#f00">from&lt;/span> None
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> time.sleep(&lt;span style="color:#f60">0.001&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">full_name&lt;/span>(func):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Return full name of `func` by adding the module and function name.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> func.&lt;span style="color:#eedd82">__module__&lt;/span> + &lt;span style="color:#87ceeb">&amp;#34;.&amp;#34;&lt;/span> + func.&lt;span style="color:#eedd82">__qualname__&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">args_to_key&lt;/span>(base: tuple, args: tuple, kwargs: dict, typed: bool | &lt;span style="color:#f00">None&lt;/span>, ignore: tuple | &lt;span style="color:#f00">None&lt;/span>) -&amp;gt; tuple:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Create cache key out of function arguments.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> base: base of key
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> args: function arguments
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> kwargs: function keyword arguments
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> typed: include types in cache key
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ignore: positional or keyword args to ignore
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> cache key tuple
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ignore = ignore or ()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> args = tuple(arg &lt;span style="color:#f00">for&lt;/span> index, arg in enumerate(args) &lt;span style="color:#f00">if&lt;/span> index not in ignore)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> key = base + args + (&lt;span style="color:#f00">None&lt;/span>,)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> kwargs:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kwargs = {key: val &lt;span style="color:#f00">for&lt;/span> key, val in kwargs.items() &lt;span style="color:#f00">if&lt;/span> key not in ignore}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sorted_items = sorted(kwargs.items())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> item in sorted_items:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> key += item
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> typed:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> key += tuple(type(arg) &lt;span style="color:#f00">for&lt;/span> arg in args)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> kwargs:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> key += tuple(type(value) &lt;span style="color:#f00">for&lt;/span> _, value in sorted_items)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> key
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> Cache:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Disk and file based caching.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> This class provides a cache that stores data on disk using SQLite as the backend.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> It supports various eviction policies, file storage, and transaction management.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> It can store data in different formats using asset handlers, allowing for efficient
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> storage and retrieval of data assets such as JSON, Polars DataFrames, Pandas DataFrames,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> NumPy arrays, and Python objects using Pickle.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> directory: Directory to store cache files.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> sqlite_timeout: SQLite connection timeout in seconds.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> dbname: SQLite database name.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> pickle_protocol: Pickle protocol version.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> eviction_policy: Eviction policy for the cache. Options are &amp;#34;none&amp;#34;, &amp;#34;least-recently-stored&amp;#34;,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;least-recently-used&amp;#34;, &amp;#34;least-frequently-used&amp;#34;. Default is &amp;#34;least-recently-stored&amp;#34;.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> cull_limit: Number of items to cull when the cache is full.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> size_limit: Maximum cache size in bytes.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> min_file_size: Minimum file size in bytes before storing as file. Otherwise store as blob in database.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> sqlite_mmap_size: SQLite mmap size in bytes.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> sqlite_cache_size: SQLite cache size in pages.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> asset_handlers: List of AssetHandler instances for handling different data formats.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__init__&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> directory: str | &lt;span style="color:#f00">None&lt;/span> = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sqlite_timeout: int = &lt;span style="color:#f60">60&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dbname: str = &lt;span style="color:#87ceeb">&amp;#34;cache.db&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pickle_protocol: int = pickle.HIGHEST_PROTOCOL,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> eviction_policy: t.Literal[
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;none&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;least-recently-stored&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;least-recently-used&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;least-frequently-used&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ] = &lt;span style="color:#87ceeb">&amp;#34;least-recently-stored&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cull_limit: int = &lt;span style="color:#f60">10&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> size_limit: int = &lt;span style="color:#f60">2&lt;/span>**&lt;span style="color:#f60">30&lt;/span>, &lt;span style="color:#0f0"># 1GB&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> min_file_size: int = &lt;span style="color:#f60">2&lt;/span>**&lt;span style="color:#f60">15&lt;/span>, &lt;span style="color:#0f0"># 32KB&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sqlite_mmap_size: int = &lt;span style="color:#f60">2&lt;/span>**&lt;span style="color:#f60">26&lt;/span>, &lt;span style="color:#0f0"># 64MB&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sqlite_cache_size: int = &lt;span style="color:#f60">2&lt;/span>**&lt;span style="color:#f60">13&lt;/span>, &lt;span style="color:#0f0"># 8,192 pages&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> asset_handlers: t.List[AssetHandler] | &lt;span style="color:#f00">None&lt;/span> = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ) -&amp;gt; &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> directory is &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> directory = tempfile.mkdtemp(prefix=&lt;span style="color:#87ceeb">&amp;#34;webcache-&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> directory = str(directory)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> directory = os.path.expanduser(directory)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> directory = os.path.expandvars(directory)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.directory = directory
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.dbname = dbname
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.sqlite_timeout = sqlite_timeout
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._local = threading.local()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._txn_id = &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.pickle_protocol = pickle_protocol
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.eviction_policy = eviction_policy
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.cull_limit = cull_limit
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.size_limit = size_limit
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.min_file_size = min_file_size
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.sqlite_mmap_size = sqlite_mmap_size
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.sqlite_cache_size = sqlite_cache_size
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Initialize asset handlers&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> asset_handlers is &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Default handlers&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.asset_handlers = [
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> JSONHandler(),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ArrowHandler(),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> PolarsHandler(),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> PandasHandler(),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> NumpyHandler(),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> PickleHandler(), &lt;span style="color:#0f0"># Pickle last as fallback&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.asset_handlers = asset_handlers
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Ensure PickleHandler is always available as fallback&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not any(isinstance(h, PickleHandler) &lt;span style="color:#f00">for&lt;/span> h in self.asset_handlers):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.asset_handlers.append(PickleHandler())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not os.path.isdir(directory):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> os.makedirs(directory, &lt;span style="color:#f60">0o755&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> OSError &lt;span style="color:#f00">as&lt;/span> error:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> error.errno != errno.EEXIST:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> EnvironmentError(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> error.errno,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;Cache directory &amp;#34;&lt;/span>&lt;span style="color:#87ceeb">%s&lt;/span>&lt;span style="color:#87ceeb">&amp;#34; does not exist and could not be created&amp;#39;&lt;/span> % self.directory,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ) &lt;span style="color:#f00">from&lt;/span> None
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con = self.connect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Set sqlite wal journal mode, auto vacuum, and mmap size, synchronous, cache size&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(&lt;span style="color:#87ceeb">&amp;#34;PRAGMA journal_mode = WAL&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(&lt;span style="color:#87ceeb">&amp;#34;PRAGMA auto_vacuum = FULL&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(&lt;span style="color:#87ceeb">&amp;#34;PRAGMA mmap_size = &lt;/span>&lt;span style="color:#87ceeb">%d&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span> % self.sqlite_mmap_size)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(&lt;span style="color:#87ceeb">&amp;#34;PRAGMA synchronous = NORMAL&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(&lt;span style="color:#87ceeb">&amp;#34;PRAGMA cache_size = &lt;/span>&lt;span style="color:#87ceeb">%d&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span> % self.sqlite_cache_size)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ((self._page_size,),) = con.execute(&lt;span style="color:#87ceeb">&amp;#34;PRAGMA page_size&amp;#34;&lt;/span>).fetchall()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(&lt;span style="color:#87ceeb">&amp;#34;CREATE TABLE IF NOT EXISTS Settings (key TEXT NOT NULL UNIQUE, value)&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(&lt;span style="color:#87ceeb">&amp;#34;INSERT OR REPLACE INTO Settings VALUES (?, ?)&amp;#34;&lt;/span>, (&lt;span style="color:#87ceeb">&amp;#34;size&amp;#34;&lt;/span>, &lt;span style="color:#f60">0&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(&lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;CREATE TABLE IF NOT EXISTS Cache (
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> rowid INTEGER PRIMARY KEY,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> key BLOB,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> raw INTEGER,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> store_time REAL,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> expire_time REAL,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> access_time REAL,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> access_count INTEGER DEFAULT 0,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> size INTEGER DEFAULT 0,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> mode INTEGER DEFAULT 0,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> filename TEXT,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> value BLOB
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> )&amp;#34;&amp;#34;&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(&lt;span style="color:#87ceeb">&amp;#34;CREATE UNIQUE INDEX IF NOT EXISTS Cache_key_raw ON Cache(key, raw)&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(&lt;span style="color:#87ceeb">&amp;#34;CREATE INDEX IF NOT EXISTS Cache_expire_time ON Cache (expire_time) WHERE expire_time IS NOT NULL&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Use triggers to keep size metadata up to date&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(&lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;CREATE TRIGGER IF NOT EXISTS Settings_size_insert
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> AFTER INSERT ON Cache FOR EACH ROW BEGIN
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> UPDATE Settings SET value = value + NEW.size
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> WHERE key = &amp;#34;size&amp;#34;; END&amp;#34;&amp;#34;&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(&lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;CREATE TRIGGER IF NOT EXISTS Settings_size_update
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> AFTER UPDATE ON Cache FOR EACH ROW BEGIN
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> UPDATE Settings
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> SET value = value + NEW.size - OLD.size
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> WHERE key = &amp;#34;size&amp;#34;; END&amp;#34;&amp;#34;&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(&lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;CREATE TRIGGER IF NOT EXISTS Settings_size_delete
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> AFTER DELETE ON Cache FOR EACH ROW BEGIN
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> UPDATE Settings SET value = value - OLD.size
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> WHERE key = &amp;#34;size&amp;#34;; END&amp;#34;&amp;#34;&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> query = EVICTION_POLICY[self.eviction_policy][&lt;span style="color:#87ceeb">&amp;#34;init&amp;#34;&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> query is not &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(query)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">connect&lt;/span>(self):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> local_pid = getattr(self._local, &lt;span style="color:#87ceeb">&amp;#34;pid&amp;#34;&lt;/span>, &lt;span style="color:#f00">None&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pid = os.getpid()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> local_pid != pid:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.close()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._local.pid = pid
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con = getattr(self._local, &lt;span style="color:#87ceeb">&amp;#34;con&amp;#34;&lt;/span>, &lt;span style="color:#f00">None&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> con is &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con = self._local.con = sqlite3.connect(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> os.path.join(self.directory, self.dbname),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> timeout=self.sqlite_timeout,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> isolation_level=&lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> con
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">close&lt;/span>(self):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con: sqlite3.Connection = getattr(self._local, &lt;span style="color:#87ceeb">&amp;#34;con&amp;#34;&lt;/span>, &lt;span style="color:#f00">None&lt;/span>) &lt;span style="color:#0f0"># type: ignore[no-untyped-call]&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> con is &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.close()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> delattr(self._local, &lt;span style="color:#87ceeb">&amp;#34;con&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> AttributeError:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">pass&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_disk_remove&lt;/span>(self, file_path):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Remove a file given by `file_path` with cross-thread and cross-process safety.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> full_path = os.path.join(self.directory, file_path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> full_dir, _ = os.path.split(full_path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> contextlib.suppress(OSError):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> os.remove(full_path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> contextlib.suppress(OSError):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> os.removedirs(full_dir)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> @contextlib.contextmanager
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">transact&lt;/span>(self, retry=&lt;span style="color:#f00">False&lt;/span>, filename=&lt;span style="color:#f00">None&lt;/span>) -&amp;gt; t.Iterator[t.Tuple[sqlite3.Connection, t.Callable]]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Transaction context manager locking the cache.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> retry: whether to retry the transaction if it fails.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> filename: filename to remove if the transaction fails.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Raises:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> CacheDatabaseTransactionTimeout: if the transaction times out.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Example:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Wrap a block of code in a transaction:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;gt;&amp;gt;&amp;gt; with cache.transact() as (con, _):
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ... con.execute(&amp;#34;CREATE TABLE IF NOT EXISTS test (id INTEGER PRIMARY KEY, name TEXT)&amp;#34;)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ... con.execute(&amp;#34;INSERT INTO test (name) VALUES (?)&amp;#34;, (&amp;#34;Alice&amp;#34;,))
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con: sqlite3.Connection = self.connect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> filenames = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> _disk_remove = self._disk_remove
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> tid = threading.get_ident()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> txn_id = self._txn_id
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> tid == txn_id:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> begin = &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">while&lt;/span> &lt;span style="color:#f00">True&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(&lt;span style="color:#87ceeb">&amp;#34;BEGIN IMMEDIATE&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> begin = &lt;span style="color:#f00">True&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._txn_id = tid
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">break&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> sqlite3.OperationalError:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> retry:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">continue&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> filename is not &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> _disk_remove(filename)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> TimeoutError &lt;span style="color:#f00">from&lt;/span> None
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">yield&lt;/span> con, filenames.append
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> BaseException:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> begin:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">assert&lt;/span> self._txn_id == tid
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._txn_id = &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(&lt;span style="color:#87ceeb">&amp;#34;ROLLBACK&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> begin:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">assert&lt;/span> self._txn_id == tid
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._txn_id = &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(&lt;span style="color:#87ceeb">&amp;#34;COMMIT&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> name in filenames:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> name is not &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> _disk_remove(name)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">setting&lt;/span>(self, key: str, value: t.Any = ENOVAL, update: bool = &lt;span style="color:#f00">True&lt;/span>) -&amp;gt; t.Any:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Get or set a setting in the cache.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con = self.connect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> value is ENOVAL:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> select = &lt;span style="color:#87ceeb">&amp;#34;SELECT value FROM Settings WHERE key = ?&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ((value,),) = sqlite_execute_with_retry(con, select, (key,)).fetchall()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> update:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> statement = &lt;span style="color:#87ceeb">&amp;#34;UPDATE Settings SET value = ? WHERE key = ?&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sqlite_execute_with_retry(con, statement, (value, key))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">volume&lt;/span>(self) -&amp;gt; int:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Return estimated total size of cache on disk in bytes.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con = self.connect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ((page_count,),) = con.execute(&lt;span style="color:#87ceeb">&amp;#34;PRAGMA page_count&amp;#34;&lt;/span>).fetchall()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_size = self._page_size * page_count + self.setting(&lt;span style="color:#87ceeb">&amp;#34;size&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> total_size
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_disk_put&lt;/span>(self, key):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Convert `key` to fields key and raw for Cache table.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> key: key to be stored in cache.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Tuple[sqlite3.Binary, bool]: a tuple of the key and a boolean indicating whether the key is a byte string.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># pylint: disable=unidiomatic-typecheck&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> type_key = type(key)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> type_key is bytes:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> sqlite3.Binary(key), &lt;span style="color:#f00">True&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> (type_key is str)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> or (type_key is int and -&lt;span style="color:#f60">9223372036854775808&lt;/span> &amp;lt;= key &amp;lt;= &lt;span style="color:#f60">9223372036854775807&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> or (type_key is float)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> key, &lt;span style="color:#f00">True&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">#&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data = pickle.dumps(key, protocol=self.pickle_protocol)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> result = pickletools.optimize(data)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> sqlite3.Binary(result), &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">touch&lt;/span>(self, key: str, expire: float | &lt;span style="color:#f00">None&lt;/span> = &lt;span style="color:#f00">None&lt;/span>, retry: bool = &lt;span style="color:#f00">False&lt;/span>) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Touch `key` in cache and update `expire` time.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> now = time.time()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> db_key, raw = self._disk_put(key)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> expire_time = &lt;span style="color:#f00">None&lt;/span> &lt;span style="color:#f00">if&lt;/span> expire is &lt;span style="color:#f00">None&lt;/span> &lt;span style="color:#f00">else&lt;/span> now + expire
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> self.transact(retry) &lt;span style="color:#f00">as&lt;/span> (con, _):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rows = con.execute(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;SELECT rowid, expire_time FROM Cache WHERE key = ? AND raw = ?&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> (db_key, raw),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ).fetchall()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> rows:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ((rowid, old_expire_time),) = rows
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> old_expire_time is &lt;span style="color:#f00">None&lt;/span> or old_expire_time &amp;gt; now:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;UPDATE Cache SET expire_time = ? WHERE rowid = ?&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> (expire_time, rowid),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">True&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_disk_fetch&lt;/span>(self, mode: int, filename: str, value: t.Any, read: bool):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Convert fields `mode`, `filename`, and `value` from Cache table to value.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> If mode is MODE_RAW, return value as bytes. If mode is MODE_BINARY and read is true, return value as file handle,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> otherwise return value as bytes. If mode is MODE_TEXT, return value as string. If mode is MODE_PICKLE, read value
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> as pickle and return the result.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> mode: mode of the value. Options are MODE_RAW, MODE_BINARY, MODE_TEXT, MODE_PICKLE.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> filename: filename of the value.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> value: value to be fetched.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> read: whether to read the value.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Any: the fetched value as str, bytes, file handle, or any other type if mode is MODE_PICKLE.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> mode == MODE_RAW:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> bytes(value) &lt;span style="color:#f00">if&lt;/span> type(value) is sqlite3.Binary &lt;span style="color:#f00">else&lt;/span> value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> mode == MODE_BINARY:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> read:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> open(os.path.join(self.directory, filename), &lt;span style="color:#87ceeb">&amp;#34;rb&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> open(os.path.join(self.directory, filename), &lt;span style="color:#87ceeb">&amp;#34;rb&amp;#34;&lt;/span>) &lt;span style="color:#f00">as&lt;/span> reader:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> reader.read()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> mode == MODE_TEXT:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> full_path = os.path.join(self.directory, filename)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> open(full_path, &lt;span style="color:#87ceeb">&amp;#34;r&amp;#34;&lt;/span>, encoding=&lt;span style="color:#87ceeb">&amp;#34;UTF-8&amp;#34;&lt;/span>) &lt;span style="color:#f00">as&lt;/span> reader:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> reader.read()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> mode == MODE_PICKLE:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> value is &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> open(os.path.join(self.directory, filename), &lt;span style="color:#87ceeb">&amp;#34;rb&amp;#34;&lt;/span>) &lt;span style="color:#f00">as&lt;/span> reader:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> pickle.load(reader)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> pickle.load(io.BytesIO(value))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> key: str,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> default: t.Any = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> read: bool = &lt;span style="color:#f00">False&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> return_expire_time: bool = &lt;span style="color:#f00">False&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> retry: bool = &lt;span style="color:#f00">False&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Get `key` from cache.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> key: key to be retrieved from cache.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> default: default value to return if key is not found.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> read: return file handle to value.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> expire_time: return expire time.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> retry: whether to retry on database lock.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> db_key, raw = self._disk_put(key)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> update_column: str = EVICTION_POLICY[self.eviction_policy][&lt;span style="color:#87ceeb">&amp;#34;get&amp;#34;&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> select = &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;SELECT rowid, expire_time, mode, filename, value
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> FROM Cache WHERE key = ? AND raw = ?
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> AND (expire_time IS NULL OR expire_time &amp;gt; ?)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> return_expire_time:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> default = (default, &lt;span style="color:#f00">None&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> self.transact(retry) &lt;span style="color:#f00">as&lt;/span> (con, _):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rows = con.execute(select, (db_key, raw, time.time())).fetchall()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not rows:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> default
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ((rowid, db_expire_time, mode, filename, db_value),) = rows
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> value = self._disk_fetch(mode, filename, db_value, read)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> IOError:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Key was deleted before we could retrieve result.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> default
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> update_column is not &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> now = time.time()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> update = &lt;span style="color:#87ceeb">&amp;#34;UPDATE Cache SET &lt;/span>&lt;span style="color:#87ceeb">%s&lt;/span>&lt;span style="color:#87ceeb"> WHERE rowid = ?&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(update % update_column.format(now=now), (rowid,))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> return_expire_time:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> value, db_expire_time
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__getitem__&lt;/span>(self, key):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Return corresponding value for `key` from cache.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> value = self.get(key, default=ENOVAL, retry=&lt;span style="color:#f00">True&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> value is ENOVAL:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> KeyError(key)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">read&lt;/span>(self, key, retry=&lt;span style="color:#f00">False&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Return file handle value corresponding to `key` from cache.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> handle = self.get(key, default=ENOVAL, read=&lt;span style="color:#f00">True&lt;/span>, retry=retry)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> handle is ENOVAL:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> KeyError(key)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> handle
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__contains__&lt;/span>(self, key: str) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Return `True` if `key` matching item is found in cache.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con = self.connect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> db_key, raw = self._disk_put(key)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> select = &lt;span style="color:#87ceeb">&amp;#34;SELECT rowid FROM Cache WHERE key = ? AND raw = ? AND (expire_time IS NULL OR expire_time &amp;gt; ?)&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rows = con.execute(select, (db_key, raw, time.time())).fetchall()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> bool(rows)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">exists&lt;/span>(self, key: str) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Return `True` if `key` matching item is found in cache.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> key in self
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_disk_filename&lt;/span>(self, key: t.Any = UNKNOWN, value: t.Any = UNKNOWN):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Return filename and full-path tuple for file storage.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> hex_name = codecs.encode(os.urandom(&lt;span style="color:#f60">16&lt;/span>), &lt;span style="color:#87ceeb">&amp;#34;hex&amp;#34;&lt;/span>).decode(&lt;span style="color:#87ceeb">&amp;#34;utf-8&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sub_dir = os.path.join(hex_name[:&lt;span style="color:#f60">2&lt;/span>], hex_name[&lt;span style="color:#f60">2&lt;/span>:&lt;span style="color:#f60">4&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> name = hex_name[&lt;span style="color:#f60">4&lt;/span>:] + &lt;span style="color:#87ceeb">&amp;#34;.val&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> filename = os.path.join(sub_dir, name)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> full_path = os.path.join(self.directory, filename)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> filename, full_path
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_safe_filename&lt;/span>(self, base_filename: str, extension: str, max_length: int = &lt;span style="color:#f60">255&lt;/span>) -&amp;gt; str:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Create a safe filename that doesn&amp;#39;t exceed filesystem limits.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> base_filename: Base filename without extension
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> extension: File extension (without dot)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> max_length: Maximum filename length (default 255 for most filesystems)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Safe filename that fits within length limits
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> full_name = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>base_filename&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">.&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>extension&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># If filename is already short enough, return as-is&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(full_name) &amp;lt;= max_length:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> full_name
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Calculate how much we need to truncate&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Reserve space for extension and a hash separator&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> available_length = max_length - len(extension) - &lt;span style="color:#f60">1&lt;/span> - &lt;span style="color:#f60">8&lt;/span> &lt;span style="color:#0f0"># 8 chars for hash&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> available_length &amp;lt;= &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Extension is too long, use only hash&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> hash_str = hashlib.md5(base_filename.encode()).hexdigest()[:&lt;span style="color:#f60">8&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>hash_str&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">.&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>extension&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Truncate base and add hash to maintain uniqueness&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> truncated_base = base_filename[:available_length]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> hash_str = hashlib.md5(base_filename.encode()).hexdigest()[:&lt;span style="color:#f60">8&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>truncated_base&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">_&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>hash_str&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">.&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>extension&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_disk_store&lt;/span>(self, value, read, key=UNKNOWN):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Convert `value` to fields size, mode, filename, and value for Cache table.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> value: value to convert
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> read: True when value is file-like object
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> key: key for item (default UNKNOWN)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> (size, mode, filename, value) tuple for Cache table
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># pylint: disable=unidiomatic-typecheck&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> type_value = type(value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> min_file_size = self.min_file_size
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> (type_value is str and len(value) &amp;lt; min_file_size)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> or (type_value is int and -&lt;span style="color:#f60">9223372036854775808&lt;/span> &amp;lt;= value &amp;lt;= &lt;span style="color:#f60">9223372036854775807&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> or (type_value is float)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f60">0&lt;/span>, MODE_RAW, &lt;span style="color:#f00">None&lt;/span>, value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> type_value is bytes:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(value) &amp;lt; min_file_size:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f60">0&lt;/span>, MODE_RAW, &lt;span style="color:#f00">None&lt;/span>, sqlite3.Binary(value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> filename, full_path = self._disk_filename(key, value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._disk_write(full_path, io.BytesIO(value), &lt;span style="color:#87ceeb">&amp;#34;xb&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> len(value), MODE_BINARY, filename, &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> type_value is str:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> filename, full_path = self._disk_filename(key, value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._disk_write(full_path, io.StringIO(value), &lt;span style="color:#87ceeb">&amp;#34;x&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;UTF-8&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> size = os.path.getsize(full_path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> size, MODE_TEXT, filename, &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> read:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> reader = partial(value.read, &lt;span style="color:#f60">2&lt;/span>**&lt;span style="color:#f60">22&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> filename, full_path = self._disk_filename(key, value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> iterator = iter(reader, &lt;span style="color:#87ceeb">b&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> size = self._disk_write(full_path, iterator, &lt;span style="color:#87ceeb">&amp;#34;xb&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> size, MODE_BINARY, filename, &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> result = pickle.dumps(value, protocol=self.pickle_protocol)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(result) &amp;lt; min_file_size:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f60">0&lt;/span>, MODE_PICKLE, &lt;span style="color:#f00">None&lt;/span>, sqlite3.Binary(result)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> filename, full_path = self._disk_filename(key, value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._disk_write(full_path, io.BytesIO(result), &lt;span style="color:#87ceeb">&amp;#34;xb&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> len(result), MODE_PICKLE, filename, &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_disk_write&lt;/span>(self, full_path, iterator, mode, encoding=&lt;span style="color:#f00">None&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> full_dir, _ = os.path.split(full_path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> count in range(&lt;span style="color:#f60">1&lt;/span>, &lt;span style="color:#f60">11&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Ensure directory exists - use exist_ok to handle race conditions&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> os.makedirs(full_dir, exist_ok=&lt;span style="color:#f00">True&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Try to open the file - if directory was deleted, this will fail&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> open(full_path, mode, encoding=encoding) &lt;span style="color:#f00">as&lt;/span> writer:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> size = &lt;span style="color:#f60">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> chunk in iterator:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> size += len(chunk)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> writer.write(chunk)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> size
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> (OSError, IOError) &lt;span style="color:#f00">as&lt;/span> e:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Handle various filesystem errors including permission issues,&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># directory deletion, disk full, etc.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> count == &lt;span style="color:#f60">10&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Give up after 10 tries&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> OSError(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Failed to write file after &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>count&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> attempts: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>e&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>) &lt;span style="color:#f00">from&lt;/span> e
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Clean up partial file if it exists&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> contextlib.suppress(OSError):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> os.path.exists(full_path):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> os.remove(full_path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Brief delay before retry to allow for transient conditions to resolve&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> time.sleep(&lt;span style="color:#f60">0.001&lt;/span> * count) &lt;span style="color:#0f0"># Exponential backoff&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">continue&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">set&lt;/span>(self, key, value, expire=&lt;span style="color:#f00">None&lt;/span>, read=&lt;span style="color:#f00">False&lt;/span>, retry=&lt;span style="color:#f00">False&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Set corresponding `value` for `key` in cache
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> key: key name
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> value: value to store
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> expire: expire time in seconds.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> read: whether to read the value.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> retry: whether to retry on database lock.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> now = time.time()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> db_key, raw = self._disk_put(key)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> expire_time = &lt;span style="color:#f00">None&lt;/span> &lt;span style="color:#f00">if&lt;/span> expire is &lt;span style="color:#f00">None&lt;/span> &lt;span style="color:#f00">else&lt;/span> now + expire
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> size, mode, filename, db_value = self._disk_store(value, read, key=key)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> columns = (expire_time, size, mode, filename, db_value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> self.transact(retry, filename) &lt;span style="color:#f00">as&lt;/span> (con, cleanup):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rows = con.execute(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;SELECT rowid, filename FROM Cache WHERE key = ? AND raw = ?&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> (db_key, raw),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ).fetchall()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> rows:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ((rowid, old_filename),) = rows
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cleanup(old_filename)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._row_update(rowid, now, columns)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._row_insert(db_key, raw, now, columns)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._cull(now, con, cleanup)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">True&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__setitem__&lt;/span>(self, key, value):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Set corresponding `value` for `key` in cache.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.set(key, value, retry=&lt;span style="color:#f00">True&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_row_update&lt;/span>(self, rowid, now, columns):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con = self.connect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> expire_time, size, mode, filename, value = columns
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;UPDATE Cache SET
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> store_time = ?,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> expire_time = ?,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> access_time = ?,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> access_count = ?,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> size = ?,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> mode = ?,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> filename = ?,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> value = ?
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> WHERE rowid = ?
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> now, &lt;span style="color:#0f0"># store_time&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> expire_time,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> now, &lt;span style="color:#0f0"># access_time&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f60">0&lt;/span>, &lt;span style="color:#0f0"># access_count&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> size,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> mode,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> filename,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> value,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rowid,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_row_insert&lt;/span>(self, key, raw, now, columns):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con = self.connect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> expire_time, size, mode, filename, value = columns
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;INSERT INTO Cache(
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> key, raw, store_time, expire_time, access_time,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> access_count, size, mode, filename, value
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> key,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> raw,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> now, &lt;span style="color:#0f0"># store_time&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> expire_time,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> now, &lt;span style="color:#0f0"># access_time&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f60">0&lt;/span>, &lt;span style="color:#0f0"># access_count&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> size,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> mode,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> filename,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> value,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_cull&lt;/span>(self, now: int | float, con: sqlite3.Connection, cleanup: t.Callable, limit: int | &lt;span style="color:#f00">None&lt;/span> = &lt;span style="color:#f00">None&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cull_limit = self.cull_limit &lt;span style="color:#f00">if&lt;/span> limit is &lt;span style="color:#f00">None&lt;/span> &lt;span style="color:#f00">else&lt;/span> limit
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> cull_limit == &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Evict expired keys&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> select_expired_template = (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;SELECT &lt;/span>&lt;span style="color:#87ceeb">%s&lt;/span>&lt;span style="color:#87ceeb"> FROM Cache WHERE expire_time IS NOT NULL AND expire_time &amp;lt; ? ORDER BY expire_time LIMIT ?&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> select_expired = select_expired_template % &lt;span style="color:#87ceeb">&amp;#34;filename&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rows = con.execute(select_expired, (now, cull_limit)).fetchall()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> rows:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> delete_expired = &lt;span style="color:#87ceeb">&amp;#34;DELETE FROM Cache WHERE rowid IN (&lt;/span>&lt;span style="color:#87ceeb">%s&lt;/span>&lt;span style="color:#87ceeb">)&amp;#34;&lt;/span> % (select_expired_template % &lt;span style="color:#87ceeb">&amp;#34;rowid&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(delete_expired, (now, cull_limit))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> (filename,) in rows:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cleanup(filename)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cull_limit -= len(rows)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> cull_limit == &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Evict keys by policy&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> select_policy = EVICTION_POLICY[self.eviction_policy][&lt;span style="color:#87ceeb">&amp;#34;cull&amp;#34;&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> select_policy is &lt;span style="color:#f00">None&lt;/span> or self.volume() &amp;lt; self.size_limit:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> select_filename = select_policy.format(fields=&lt;span style="color:#87ceeb">&amp;#34;filename&amp;#34;&lt;/span>, now=now)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rows = con.execute(select_filename, (cull_limit,)).fetchall()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> rows:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> delete = &lt;span style="color:#87ceeb">&amp;#34;DELETE FROM Cache WHERE rowid IN (&lt;/span>&lt;span style="color:#87ceeb">%s&lt;/span>&lt;span style="color:#87ceeb">)&amp;#34;&lt;/span> % (select_policy.format(fields=&lt;span style="color:#87ceeb">&amp;#34;rowid&amp;#34;&lt;/span>, now=now))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(delete, (cull_limit,))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> (filename,) in rows:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cleanup(filename)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">clear&lt;/span>(self, *args, retry=&lt;span style="color:#f00">False&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Remove items from cache.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> *args: Optional arguments to specify what to clear:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> - No args: Clear all items (default behavior)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> - str: Clear items matching the key name
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> - callable: Clear items for memoized/asset decorated functions
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> - list/tuple: Clear items matching multiple keys/functions
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> retry: Whether to retry on database lock
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> int: Number of items cleared
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Examples:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;gt;&amp;gt;&amp;gt; cache = Cache()
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;gt;&amp;gt;&amp;gt; cache.clear() # Clear all items
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;gt;&amp;gt;&amp;gt; cache.clear(&amp;#34;my_key&amp;#34;) # Clear specific key
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;gt;&amp;gt;&amp;gt; cache.clear(my_func) # Clear memoized function cache
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;gt;&amp;gt;&amp;gt; cache.clear(&amp;#34;key1&amp;#34;, &amp;#34;key2&amp;#34;, my_func) # Clear multiple items
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not args:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Clear all items (original behavior)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> select = &lt;span style="color:#87ceeb">&amp;#34;SELECT rowid, filename FROM Cache WHERE rowid &amp;gt; ? ORDER BY rowid LIMIT ?&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> select_args = [&lt;span style="color:#f60">0&lt;/span>, &lt;span style="color:#f60">100&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> self._select_delete(select, select_args, retry=retry)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Build list of keys to clear&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keys_to_clear = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> arg in args:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(arg, str):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Direct key name&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keys_to_clear.append(arg)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> callable(arg):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Memoized or asset decorated function&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> hasattr(arg, &lt;span style="color:#87ceeb">&amp;#34;__cache_key__&amp;#34;&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># This is a decorated function - we need to clear all its cached results&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># We&amp;#39;ll use a pattern match on the function&amp;#39;s full name&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> func_name = full_name(arg)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keys_to_clear.append(func_name)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Regular function, use its full name&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keys_to_clear.append(full_name(arg))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> isinstance(arg, (list, tuple)):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Recursively process lists/tuples&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> item in arg:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(item, str):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keys_to_clear.append(item)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> callable(item):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> hasattr(item, &lt;span style="color:#87ceeb">&amp;#34;__cache_key__&amp;#34;&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keys_to_clear.append(full_name(item))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keys_to_clear.append(full_name(item))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Convert other types to string keys&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keys_to_clear.append(str(arg))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not keys_to_clear:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f60">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> self._clear_specific_keys(keys_to_clear, retry=retry)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_clear_specific_keys&lt;/span>(self, keys: t.List[str], retry: bool = &lt;span style="color:#f00">False&lt;/span>) -&amp;gt; int:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Clear cache entries for specific keys or key patterns.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> keys: List of keys or function names to clear
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> retry: Whether to retry on database lock
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Number of items cleared
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_cleared = &lt;span style="color:#f60">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> key_pattern in keys:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Handle exact key matches&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> db_key, raw = self._disk_put(key_pattern)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Clear exact matches&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> select = &lt;span style="color:#87ceeb">&amp;#34;SELECT rowid, filename FROM Cache WHERE key = ? AND raw = ?&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> self.transact(retry) &lt;span style="color:#f00">as&lt;/span> (con, cleanup):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rows = con.execute(select, (db_key, raw)).fetchall()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> rows:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rowids = [str(row[&lt;span style="color:#f60">0&lt;/span>]) &lt;span style="color:#f00">for&lt;/span> row in rows]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> delete = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;DELETE FROM Cache WHERE rowid IN (&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>&lt;span style="color:#87ceeb">&amp;#39;,&amp;#39;&lt;/span>.join(rowids)&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">)&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(delete)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> row in rows:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cleanup(row[&lt;span style="color:#f60">1&lt;/span>]) &lt;span style="color:#0f0"># Clean up file&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_cleared += len(rows)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> TimeoutError:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">pass&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Also clear function-based cache entries (for memoized/asset functions)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># These have keys that start with the function name tuple&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> self.transact(retry) &lt;span style="color:#f00">as&lt;/span> (con, cleanup):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Get all non-raw cache entries and check them&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> select = &lt;span style="color:#87ceeb">&amp;#34;SELECT rowid, filename, key FROM Cache WHERE raw = 0&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rows = con.execute(select).fetchall()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> matching_rows = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> rowid, filename, key_blob in rows:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Deserialize the key to check if it matches our function&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> key_tuple = pickle.loads(bytes(key_blob))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(key_tuple, tuple) and len(key_tuple) &amp;gt; &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> key_tuple[&lt;span style="color:#f60">0&lt;/span>] == key_pattern:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> matching_rows.append((rowid, filename))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> (pickle.PickleError, TypeError, IndexError):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">continue&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> matching_rows:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rowids = [str(row[&lt;span style="color:#f60">0&lt;/span>]) &lt;span style="color:#f00">for&lt;/span> row in matching_rows]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> delete = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;DELETE FROM Cache WHERE rowid IN (&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>&lt;span style="color:#87ceeb">&amp;#39;,&amp;#39;&lt;/span>.join(rowids)&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">)&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(delete)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> _, filename in matching_rows:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cleanup(filename)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_cleared += len(matching_rows)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> (TimeoutError, sqlite3.Error):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">pass&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> total_cleared
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_select_delete&lt;/span>(self, select, args, row_index=&lt;span style="color:#f60">0&lt;/span>, arg_index=&lt;span style="color:#f60">0&lt;/span>, retry=&lt;span style="color:#f00">False&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> count = &lt;span style="color:#f60">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> delete = &lt;span style="color:#87ceeb">&amp;#34;DELETE FROM Cache WHERE rowid IN (&lt;/span>&lt;span style="color:#87ceeb">%s&lt;/span>&lt;span style="color:#87ceeb">)&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">while&lt;/span> &lt;span style="color:#f00">True&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> self.transact(retry) &lt;span style="color:#f00">as&lt;/span> (con, cleanup):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rows = con.execute(select, args).fetchall()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not rows:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">break&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> count += len(rows)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(delete % &lt;span style="color:#87ceeb">&amp;#34;,&amp;#34;&lt;/span>.join(str(row[&lt;span style="color:#f60">0&lt;/span>]) &lt;span style="color:#f00">for&lt;/span> row in rows))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> row in rows:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> args[arg_index] = row[row_index]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cleanup(row[-&lt;span style="color:#f60">1&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> TimeoutError:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> TimeoutError(count) &lt;span style="color:#f00">from&lt;/span> None
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> count
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">cleanup&lt;/span>(self, force: bool = &lt;span style="color:#f00">False&lt;/span>, cache_timeout: int = &lt;span style="color:#f60">3600&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Remove all cache files that are older than `cache_timeout` seconds.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> force (bool): If True, delete all cache files regardless of age.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> cache_timeout (int): Cache timeout in seconds.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> path in Path(self.directory).iterdir():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> force:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> delete(Path(path))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">continue&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> age = time.time() - os.stat(path).st_mtime
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> age &amp;gt; cache_timeout:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> delete(Path(path))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">memoize&lt;/span>(self, name=&lt;span style="color:#f00">None&lt;/span>, typed=&lt;span style="color:#f00">False&lt;/span>, expire=&lt;span style="color:#f00">None&lt;/span>, ignore=()) -&amp;gt; t.Callable[[t.Callable[P, T]], t.Callable[P, T]]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Memoizing cache decorator.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Decorator to wrap callable with memoizing function using cache.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Repeated calls with the same arguments will lookup result in cache and
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> avoid function evaluation.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> If name is set to None (default), the callable name will be determined
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> automatically.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> When expire is set to zero, function results will not be set in the
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> cache. Cache lookups still occur, however. Read
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> :doc:`case-study-landing-page-caching` for example usage.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> If typed is set to True, function arguments of different types will be
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> cached separately. For example, f(3) and f(3.0) will be treated as
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> distinct calls with distinct results.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> The original underlying function is accessible through the __wrapped__
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> attribute. This is useful for introspection, for bypassing the cache,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> or for rewrapping the function with a different cache.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;gt;&amp;gt;&amp;gt; cache = Cache()
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;gt;&amp;gt;&amp;gt; @cache.memoize(expire=1, tag=&amp;#39;fib&amp;#39;)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ... def fibonacci(number):
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ... if number == 0:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ... return 0
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ... elif number == 1:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ... return 1
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ... else:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ... return fibonacci(number - 1) + fibonacci(number - 2)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;gt;&amp;gt;&amp;gt; print(fibonacci(100))
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> 354224848179261915075
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> An additional `__cache_key__` attribute can be used to generate the
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> cache key used for the given arguments.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;gt;&amp;gt;&amp;gt; key = fibonacci.__cache_key__(100)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;gt;&amp;gt;&amp;gt; print(cache[key])
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> 354224848179261915075
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Remember to call memoize when decorating a callable. If you forget,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> then a TypeError will occur. Note the lack of parenthenses after
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> memoize below:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;gt;&amp;gt;&amp;gt; @cache.memoize
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ... def test():
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ... pass
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Traceback (most recent call last):
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ...
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> TypeError: name cannot be callable
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> cache: cache to store callable arguments and return values
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> str name: name given for callable (default None, automatic)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> bool typed: cache different types separately (default False)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> float expire: seconds until arguments expire
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> (default None, no expiry)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> str tag: text to associate with arguments (default None)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> set ignore: positional or keyword args to ignore (default ())
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> callable decorator
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> callable(name):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> TypeError(&lt;span style="color:#87ceeb">&amp;#34;name cannot be callable&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">decorator&lt;/span>(func: t.Callable[P, T]) -&amp;gt; t.Callable[P, T]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Decorator created by memoize() for callable `func`.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> base = (full_name(func),) &lt;span style="color:#f00">if&lt;/span> name is &lt;span style="color:#f00">None&lt;/span> &lt;span style="color:#f00">else&lt;/span> (name,)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> @wraps(func)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">wrapper&lt;/span>(*args: P.args, **kwargs: P.kwargs) -&amp;gt; T:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Wrapper for callable to cache arguments and return values.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> key = wrapper.__cache_key__(*args, **kwargs) &lt;span style="color:#0f0"># type: ignore[no-untyped-call]&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> result = self.get(key, default=ENOVAL, retry=&lt;span style="color:#f00">True&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> result is ENOVAL:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> result = func(*args, **kwargs)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> expire is &lt;span style="color:#f00">None&lt;/span> or expire &amp;gt; &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.set(key, result, expire, retry=&lt;span style="color:#f00">True&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> result
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> t.cast(T, result)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__cache_key__&lt;/span>(*args, **kwargs):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Make key for cache given function arguments.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> args_to_key(base, args, kwargs, typed, ignore)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> wrapper.__cache_key__ = __cache_key__ &lt;span style="color:#0f0"># type: ignore[no-untyped-call]&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> t.cast(t.Callable[P, T], wrapper)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> decorator
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">register_asset_handler&lt;/span>(self, handler: AssetHandler, prepend: bool = &lt;span style="color:#f00">False&lt;/span>) -&amp;gt; &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Register a new asset handler.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> handler: The AssetHandler instance to register
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> prepend: If True, add to beginning of list (higher priority)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> prepend:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.asset_handlers.insert(&lt;span style="color:#f60">0&lt;/span>, handler)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Insert before PickleHandler if it exists&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pickle_idx = &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i, h in enumerate(self.asset_handlers):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(h, PickleHandler):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pickle_idx = i
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">break&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> pickle_idx is not &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.asset_handlers.insert(pickle_idx, handler)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.asset_handlers.append(handler)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_get_asset_handler&lt;/span>(self, data: t.Any, format: str | &lt;span style="color:#f00">None&lt;/span> = &lt;span style="color:#f00">None&lt;/span>) -&amp;gt; AssetHandler:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Get the appropriate asset handler for the data.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> data: The data to find a handler for
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> format: Optional format hint
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> AssetHandler: The handler that can process this data
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># If format is specified, try to find handler with that format name&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> format:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> handler in self.asset_handlers:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> handler.format_name == format and handler.is_available():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> handler
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Otherwise, find first handler that can handle the data&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> handler in self.asset_handlers:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> handler.is_available() and handler.can_handle(data):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> handler
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># This should never happen if PickleHandler is in the list&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ValueError(&lt;span style="color:#87ceeb">&amp;#34;No handler found for data type&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_store_asset&lt;/span>(self, key: str, data: t.Any, expire: float | &lt;span style="color:#f00">None&lt;/span>, format: str | &lt;span style="color:#f00">None&lt;/span> = &lt;span style="color:#f00">None&lt;/span>) -&amp;gt; &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Store an asset using the appropriate handler.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> key: Cache key
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> data: Data to store
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> expire: Expiration time in seconds
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> format: Optional format hint
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> handler = self._get_asset_handler(data, format)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Generate filename with format extension and data type info to avoid collisions&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Include data type information in the filename to prevent collisions&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data_type_info = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>type(data).&lt;span style="color:#eedd82">__module__&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">.&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>type(data).&lt;span style="color:#eedd82">__name__&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Create a hash of the data type to keep filename manageable&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> type_hash = hashlib.md5(data_type_info.encode()).hexdigest()[:&lt;span style="color:#f60">8&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> filename, full_path = self._disk_filename(key, str(data)[:&lt;span style="color:#f60">100&lt;/span>]) &lt;span style="color:#0f0"># Use truncated str for filename&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Use safe filename to ensure it doesn&amp;#39;t exceed filesystem limits&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Only apply to the actual filename part, not the directory path&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dir_part, name_part = os.path.split(filename)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> name_base, _ = name_part.rsplit(&lt;span style="color:#87ceeb">&amp;#34;.&amp;#34;&lt;/span>, &lt;span style="color:#f60">1&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> safe_name = self._safe_filename(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>name_base&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">_&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>type_hash&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>, handler.format_name)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> filename = os.path.join(dir_part, safe_name)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dir_part, name_part = os.path.split(full_path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> full_path = os.path.join(dir_part, safe_name)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Save using handler&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> full_dir, _ = os.path.split(full_path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> os.makedirs(full_dir, exist_ok=&lt;span style="color:#f00">True&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> handler.save(data, full_path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># For numpy handler, the actual file created might have .npy extension&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> actual_path = full_path
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(handler, NumpyHandler) and not os.path.exists(full_path):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> actual_path = full_path + &lt;span style="color:#87ceeb">&amp;#39;.npy&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> size = os.path.getsize(actual_path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> Exception &lt;span style="color:#f00">as&lt;/span> e:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Clean up on failure - check both possible paths&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> path_to_clean in [full_path, full_path + &lt;span style="color:#87ceeb">&amp;#39;.npy&amp;#39;&lt;/span>]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> os.path.exists(path_to_clean):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> os.remove(path_to_clean)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> e
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Store metadata in database&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> now = time.time()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> db_key, raw = self._disk_put(key)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> expire_time = &lt;span style="color:#f00">None&lt;/span> &lt;span style="color:#f00">if&lt;/span> expire is &lt;span style="color:#f00">None&lt;/span> &lt;span style="color:#f00">else&lt;/span> now + expire
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Update filename to reflect the actual file created (for numpy handler)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(handler, NumpyHandler) and not os.path.exists(os.path.join(self.directory, filename)):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> filename = filename + &lt;span style="color:#87ceeb">&amp;#39;.npy&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> self.transact(retry=&lt;span style="color:#f00">True&lt;/span>, filename=filename) &lt;span style="color:#f00">as&lt;/span> (con, cleanup):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Check if key already exists&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rows = con.execute(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;SELECT rowid, filename FROM Cache WHERE key = ? AND raw = ?&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> (db_key, raw),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ).fetchall()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> rows:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ((rowid, old_filename),) = rows
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cleanup(old_filename)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Update existing entry&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;UPDATE Cache SET
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> store_time = ?, expire_time = ?, access_time = ?, access_count = ?,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> size = ?, mode = ?, filename = ?, value = ?
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> WHERE rowid = ?&amp;#34;&amp;#34;&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> (now, expire_time, now, &lt;span style="color:#f60">0&lt;/span>, size, MODE_BINARY, filename, &lt;span style="color:#f00">None&lt;/span>, rowid),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Insert new entry&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con.execute(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;INSERT INTO Cache(
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> key, raw, store_time, expire_time, access_time,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> access_count, size, mode, filename, value
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)&amp;#34;&amp;#34;&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> (db_key, raw, now, expire_time, now, &lt;span style="color:#f60">0&lt;/span>, size, MODE_BINARY, filename, &lt;span style="color:#f00">None&lt;/span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._cull(now, con, cleanup)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_load_asset&lt;/span>(self, path: str, format: str | &lt;span style="color:#f00">None&lt;/span> = &lt;span style="color:#f00">None&lt;/span>) -&amp;gt; t.Any:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Load an asset from file.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> path: File path
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> format: Optional format hint
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> The loaded data
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Check if file exists&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not os.path.exists(path):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> FileNotFoundError(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Asset file not found: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>path&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Determine format from file extension if not provided&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> format is &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> _, ext = os.path.splitext(path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> format = ext.lstrip(&lt;span style="color:#87ceeb">&amp;#34;.&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Try to load with the specified/detected format handler&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> format_handler = &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> handler in self.asset_handlers:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> handler.format_name == format and handler.is_available():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> format_handler = handler
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">break&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> format_handler:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> format_handler.load(path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> Exception &lt;span style="color:#f00">as&lt;/span> e:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># If specific format handler fails, try fallback strategies&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not isinstance(format_handler, PickleHandler):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Try pickle fallback if the original format wasn&amp;#39;t pickle&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pickle_handler = &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> handler in self.asset_handlers:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(handler, PickleHandler):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pickle_handler = handler
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">break&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> pickle_handler:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> pickle_handler.load(path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> Exception:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># If pickle also fails, raise the original error&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ValueError(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Failed to load asset with &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>format&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> format: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>e&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>) &lt;span style="color:#f00">from&lt;/span> e
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># If no fallback or fallback failed, raise original error&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ValueError(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Failed to load asset with &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>format&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> format: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>e&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>) &lt;span style="color:#f00">from&lt;/span> e
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># No handler found for the format - try pickle as last resort&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pickle_handler = &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> handler in self.asset_handlers:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(handler, PickleHandler):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pickle_handler = handler
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">break&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> pickle_handler:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> pickle_handler.load(path)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> Exception &lt;span style="color:#f00">as&lt;/span> e:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ValueError(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;No handler found for format &amp;#39;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>format&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#39; and pickle fallback failed: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>e&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>) &lt;span style="color:#f00">from&lt;/span> e
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ValueError(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;No handler found for format: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>format&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">asset&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> name: str | &lt;span style="color:#f00">None&lt;/span> = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> typed: bool | &lt;span style="color:#f00">None&lt;/span> = &lt;span style="color:#f00">False&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> expire: int | float | &lt;span style="color:#f00">None&lt;/span> = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ignore: tuple = (),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> format=&lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ) -&amp;gt; t.Callable[[t.Callable[P, T]], t.Callable[P, T]]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Asset caching decorator.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Decorator to wrap callable with asset caching function.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Similar to memoize but stores data in native formats instead of pickle.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Supports various data types through duck typing:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> - PyArrow Tables: stored as Arrow IPC files
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> - Polars DataFrames: stored as Parquet files
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> - Pandas DataFrames: stored as Parquet files
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> - NumPy arrays: stored as .npy files
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> - Dicts/Lists: stored as JSON or pickle based on content
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> - Other types: fallback to pickle
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> name: name given for callable (default None, automatic)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> typed: cache different types separately (default False)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> expire: seconds until arguments expire (default None, no expiry)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ignore: positional or keyword args to ignore (default ())
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> format: force specific format (arrow, parquet, numpy, json, pickle)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> callable decorator
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> callable(name):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> TypeError(&lt;span style="color:#87ceeb">&amp;#34;name cannot be callable&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">decorator&lt;/span>(func: t.Callable[P, T]) -&amp;gt; t.Callable[P, T]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Decorator created by asset() for callable `func`.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> base = (full_name(func),) &lt;span style="color:#f00">if&lt;/span> name is &lt;span style="color:#f00">None&lt;/span> &lt;span style="color:#f00">else&lt;/span> (name,)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> @wraps(func)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">wrapper&lt;/span>(*args: P.args, **kwargs: P.kwargs) -&amp;gt; T:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Wrapper for callable to cache asset arguments and return values.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> key = wrapper.__cache_key__(*args, **kwargs) &lt;span style="color:#0f0"># type: ignore[attr-defined]&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Check if asset exists in cache&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> con = self.connect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> db_key, raw = self._disk_put(key)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> select = &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;SELECT filename, mode FROM Cache WHERE key = ? AND raw = ?
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> AND (expire_time IS NULL OR expire_time &amp;gt; ?)&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rows = con.execute(select, (db_key, raw, time.time())).fetchall()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> rows:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ((filename, _),) = rows &lt;span style="color:#0f0"># mode not needed here, renamed to _&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> filename:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Load from file using appropriate loader&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> full_path = os.path.join(self.directory, filename)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> t.cast(T, self._load_asset(full_path, format))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Asset not in cache, compute it&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> result = func(*args, **kwargs)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> expire is &lt;span style="color:#f00">None&lt;/span> or expire &amp;gt; &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Store asset in appropriate format&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._store_asset(key, result, expire, format)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> result
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__cache_key__&lt;/span>(*args, **kwargs):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Make key for cache given function arguments.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> args_to_key(base, args, kwargs, typed, ignore)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> wrapper.__cache_key__ = __cache_key__ &lt;span style="color:#0f0"># type: ignore[attr-defined]&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> t.cast(t.Callable[P, T], wrapper)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> decorator
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Document Chunking for LLMs</title><link>https://asifr.com/document-chunking-llm/</link><pubDate>Tue, 17 Jun 2025 00:00:00 +0000</pubDate><guid>https://asifr.com/document-chunking-llm/</guid><description>
&lt;p>I&amp;rsquo;ve found that splitting large documents into smaller chunks requires some trial and error to find the right strategy.&lt;/p>
&lt;p>There are basically three competing factors:&lt;/p>
&lt;ol>
&lt;li>Semantic coherence within chunks - the chunk should contain related information. A small chunk size loses context but is more coherent while a large chunk size is less coherent but contains more context.&lt;/li>
&lt;li>Semantic separation between chunks - the chunks should be distinct enough to avoid overlap but not so distinct that they lose context. This means we want to avoid splitting chunks at arbitrary points like sentences or paragraphs and instead split them at semantic boundaries of sections and topics. Even having some overlap between chunks can be helpful to keep context.&lt;/li>
&lt;li>Information preservation - chunks should be self-contained enough to be useful on their own. This means that we need to split text at context boundaries and ensure that the similar concepts are grouped together in the same chunk and not split across chunks.&lt;/li>
&lt;/ol>
&lt;p>Ultimately, the appropriate startegy depends on the type of document and the intended use case. For example, a user guide is organized into chapters and sections. The beginning of a chapter introduces the topic, followed by sections that provide detailed workflows and instructions. In this case, it makes sense to keep the workflows and instructions together in the same chunk. We want to avoid splitting a step in a workflow into two independent chunks that loses it&amp;rsquo;s place in the workflow and more importantly, the context of the step. If the document is more structured, like an invoice or technical specifications with tables, than a more sophisticated chunking strategy may be needed to preserve the relationships between the data points. This article focuses on chunking textual documents, such as reports and manuals, where the text is more free-form and less structured.&lt;/p>
&lt;p>Table of contents:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="#algorithm-overview">Algorithm Overview&lt;/a>&lt;/li>
&lt;li>&lt;a href="#example-berkshire-hathaway-annual-report">Example: Berkshire Hathaway Annual Report&lt;/a>&lt;/li>
&lt;li>&lt;a href="#implementation-code">Implementation Code&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="algorithm-overview">Algorithm Overview&lt;/h2>
&lt;p>Lets build a chunking algorithm based on these principles from the ground up. Conceptually, we want to start with sentences since they represent the basic unit of a coherent meaning. And from there we need a strategy for iteratively merging sentences that achieves a balance between the three stated objectives: semantic coherence within chunks, semantic separation between chunks, and information preservation.&lt;/p>
&lt;p>So the algorithm consists of two main stages:&lt;/p>
&lt;ol>
&lt;li>Semantic chunking breaks the document into semantically coherent chunks by:
&lt;ul>
&lt;li>Splitting into sentences&lt;/li>
&lt;li>Computing embeddings for each sentence&lt;/li>
&lt;li>Using cosine distance between consecutive sentence embeddings to detect topic shifts&lt;/li>
&lt;li>Enforcing size constraints while adding overlap&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Section grouping organizes chunks into a hierarchical structure by:
&lt;ul>
&lt;li>Generating descriptive metadata (title and category) for each chunk using an LLM&lt;/li>
&lt;li>Computing weighted similarity between chunk metadata&lt;/li>
&lt;li>Grouping similar chunks into sections based on similarity thresholds&lt;/li>
&lt;li>Creating subsections within sections for fine-grained organization&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;h2 id="example-berkshire-hathaway-annual-report">Example: Berkshire Hathaway Annual Report&lt;/h2>
&lt;p>As an example, I&amp;rsquo;ve taken the 2024 &lt;a href="https://www.berkshirehathaway.com/letters/2024ltr.pdf">Berkshire Hathaway Annual Report&lt;/a> and passed it through the chunking process. First I generates sentences from the document using regex and then computes embeddings and uses cosine similarity to merge sentences into partially overlapping chunks. Next I take each chunk and use an LLM to generate metadata including a title and category for each chunk. Notice how there is consistency in the titles and categories across chunks. This is because the LLM is prompted with the trailing 5 metadata entries to provide context for the current chunk. Finally, I use the embedding approach to merge chunks into sections based on the semantic similarity of the title and category of each chunk. A tunable similarity threshold of 0.75 is used to combine chunks into sections. The resulting sections are more coherent and self-contained, making them easier to use in a RAG system.&lt;/p>
&lt;p>The table below shows the title and category generated by an LLM for each chunk. Splits happen when the chunk similarity between the current chunk and the previous chunk is below a threshold of 0.75 or when the chunk size exceeds a maximum character limit. When there is a split, the table below reports the similarity of the current chunk to the previous chunk.&lt;/p>
&lt;p>In the beginning of the letter to the shareholders, the chunks are smaller and more granular because Buffet is introducing several concepts and providing an overview of the risks, management philosophy, and strategic focus. Chunks are smaller, spanning a few sentences and the similarity between chunks is lower. As the letter progresses, the chunks become larger and more coherent, with higher similarity scores as Buffet discusses specific topics in more detail. The final sections of the letter are larger and more comprehensive, covering specific companies, investments, and their performance in depth. Notice that even though the similarity between chunks is high and the title and category are the same, the chunk size grows larger and a new split is created.&lt;/p>
&lt;style>
table tr th:first-child, table tr td:first-child {
width: 80px !important;
}
table tr th:nth-child(2), table tr td:nth-child(2) {
width: 50% !important;
}
table tr th:nth-child(3), table tr td:nth-child(3) {
width: 50% !important;
}
table tr th:nth-child(4), table tr td:nth-child(4) {
width: 80px !important;
}
&lt;/style>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Chunk&lt;/th>
&lt;th>Title&lt;/th>
&lt;th>Category&lt;/th>
&lt;th>Previous&lt;br>Chunk&lt;br>Similarity&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>1&lt;/td>
&lt;td>Berkshire Hathaway Annual Report&lt;/td>
&lt;td>Financial Report&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>2&lt;/td>
&lt;td>Berkshire Hathaway Annual Report&lt;/td>
&lt;td>Financial Report&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>3&lt;/td>
&lt;td>Company Transparency &amp;amp; Reporting&lt;/td>
&lt;td>Corporate Communication &amp;amp; Reporting&lt;/td>
&lt;td>0.541&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>4&lt;/td>
&lt;td>Responsibility &amp;amp; Communication&lt;/td>
&lt;td>Corporate Communication &amp;amp; Reporting&lt;/td>
&lt;td>0.74&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>5&lt;/td>
&lt;td>Strategic Review &amp;amp; Ownership Dialogue&lt;/td>
&lt;td>Corporate Communication &amp;amp; Reporting&lt;/td>
&lt;td>0.696&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>6&lt;/td>
&lt;td>Communication Strategy&lt;/td>
&lt;td>Business &amp;amp; Investor Relations&lt;/td>
&lt;td>0.576&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>7&lt;/td>
&lt;td>Berkshire Hathaway&amp;rsquo;s Risk Management Approach&lt;/td>
&lt;td>Financial Risk Management&lt;/td>
&lt;td>0.479&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>8&lt;/td>
&lt;td>Mistakes and Strategic Assessment&lt;/td>
&lt;td>Business Strategy&lt;/td>
&lt;td>0.528&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>9&lt;/td>
&lt;td>Mistakes in Berkshire Acquisitions&lt;/td>
&lt;td>Business Risk &amp;amp; Management&lt;/td>
&lt;td>0.641&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>10&lt;/td>
&lt;td>Mistakes in Hiring, Impacting Berkshire&lt;/td>
&lt;td>Corporate Management &amp;amp; Risk&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>11&lt;/td>
&lt;td>Mistakes in Hiring Assessment&lt;/td>
&lt;td>Personnel Management &amp;amp; Decision Making&lt;/td>
&lt;td>0.668&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>12&lt;/td>
&lt;td>Painful Mistakes, Diminishing Returns&lt;/td>
&lt;td>Financial &amp;amp; Strategic&lt;/td>
&lt;td>0.506&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>13&lt;/td>
&lt;td>Mistakes and Delay&lt;/td>
&lt;td>Strategic &amp;amp; Operational&lt;/td>
&lt;td>0.679&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>14&lt;/td>
&lt;td>Mistakes and Analysis&lt;/td>
&lt;td>Business Strategy&lt;/td>
&lt;td>0.643&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>15&lt;/td>
&lt;td>Mistakes and Their Impact&lt;/td>
&lt;td>Business Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>16&lt;/td>
&lt;td>Word Frequency Analysis&lt;/td>
&lt;td>Text Analysis&lt;/td>
&lt;td>0.37&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>17&lt;/td>
&lt;td>Company Observations&lt;/td>
&lt;td>Business &amp;amp; Communication&lt;/td>
&lt;td>0.405&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>18&lt;/td>
&lt;td>Behavioral Observations&lt;/td>
&lt;td>Business &amp;amp; Risk&lt;/td>
&lt;td>0.652&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>19&lt;/td>
&lt;td>CEO Succession &amp;amp; Risk&lt;/td>
&lt;td>Corporate Management &amp;amp; Risk&lt;/td>
&lt;td>0.568&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>20&lt;/td>
&lt;td>CEO Transition &amp;amp; Risk&lt;/td>
&lt;td>Corporate Management &amp;amp; Risk&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>21&lt;/td>
&lt;td>CEO Succession &amp;amp; Berkshire&amp;rsquo;s Risk&lt;/td>
&lt;td>Corporate Strategy &amp;amp; Risk&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>22&lt;/td>
&lt;td>Berkshire CEO Philosophy&lt;/td>
&lt;td>Corporate Strategy &amp;amp; Leadership&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>23&lt;/td>
&lt;td>Pete Liegl&amp;rsquo;s Legacy&lt;/td>
&lt;td>Business &amp;amp; Leadership&lt;/td>
&lt;td>0.53&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>24&lt;/td>
&lt;td>Pete Liegl - A Wealthing Story&lt;/td>
&lt;td>Business &amp;amp; Financial&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>25&lt;/td>
&lt;td>Pete - Forest River Founder&lt;/td>
&lt;td>Business &amp;amp; Founding&lt;/td>
&lt;td>0.572&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>26&lt;/td>
&lt;td>Forest River Acquisition&lt;/td>
&lt;td>Business &amp;amp; Financial&lt;/td>
&lt;td>0.638&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>27&lt;/td>
&lt;td>RV Deal - Initial Communication&lt;/td>
&lt;td>Business &amp;amp; Communication &amp;amp; Financial&lt;/td>
&lt;td>0.648&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>28&lt;/td>
&lt;td>Berkshire Acquisition Deal&lt;/td>
&lt;td>Business &amp;amp; Finance&lt;/td>
&lt;td>0.646&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>29&lt;/td>
&lt;td>Meeting Details &amp;amp; Price Discussion&lt;/td>
&lt;td>Business &amp;amp; Communication &amp;amp; Strategy&lt;/td>
&lt;td>0.513&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>30&lt;/td>
&lt;td>Meeting &amp;amp; Deal&lt;/td>
&lt;td>Business &amp;amp; Communication&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>31&lt;/td>
&lt;td>Business Meeting &amp;amp; Financial Planning&lt;/td>
&lt;td>Business &amp;amp; Strategy&lt;/td>
&lt;td>0.664&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>32&lt;/td>
&lt;td>Berkshire Hathaway Business Deal&lt;/td>
&lt;td>Business &amp;amp; Financial&lt;/td>
&lt;td>0.576&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>33&lt;/td>
&lt;td>Real Estate Deal&lt;/td>
&lt;td>Business &amp;amp; Financial&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>34&lt;/td>
&lt;td>Real Estate Lease Dispute&lt;/td>
&lt;td>Business &amp;amp; Financial&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>35&lt;/td>
&lt;td>Meeting Dynamics&lt;/td>
&lt;td>Business &amp;amp; Communication &amp;amp; Strategy&lt;/td>
&lt;td>0.514&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>36&lt;/td>
&lt;td>Compensation Structure&lt;/td>
&lt;td>Financial &amp;amp; Human Resources&lt;/td>
&lt;td>0.429&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>37&lt;/td>
&lt;td>Compensation Structure&lt;/td>
&lt;td>Financial &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>38&lt;/td>
&lt;td>Berkshire&amp;rsquo;s Compensation Offer&lt;/td>
&lt;td>Financial &amp;amp; Business&lt;/td>
&lt;td>0.669&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>39&lt;/td>
&lt;td>Berkshire’s Financial Strategy&lt;/td>
&lt;td>Financial Strategy &amp;amp; Risk&lt;/td>
&lt;td>0.677&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>40&lt;/td>
&lt;td>Berkshire’s Early Success&lt;/td>
&lt;td>Business &amp;amp; Financial&lt;/td>
&lt;td>0.741&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>41&lt;/td>
&lt;td>Simple Success&lt;/td>
&lt;td>Business &amp;amp; Financial&lt;/td>
&lt;td>0.737&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>42&lt;/td>
&lt;td>Berkshire’s Strategic Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.492&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>43&lt;/td>
&lt;td>Berkshire’s Performance&lt;/td>
&lt;td>Financial &amp;amp; Strategic&lt;/td>
&lt;td>0.736&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>44&lt;/td>
&lt;td>Mistakes in Berkshire Acquisitions&lt;/td>
&lt;td>Business Risk &amp;amp; Management&lt;/td>
&lt;td>0.645&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>45&lt;/td>
&lt;td>Berkshire’s Strategic Imperfections&lt;/td>
&lt;td>Business Strategy &amp;amp; Risk&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>46&lt;/td>
&lt;td>Strategic Focus &amp;amp; Partnership&lt;/td>
&lt;td>Business &amp;amp; Strategy&lt;/td>
&lt;td>0.624&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>47&lt;/td>
&lt;td>Strategic Imperative&lt;/td>
&lt;td>Business &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>48&lt;/td>
&lt;td>CEO Mistakes &amp;amp; Analysis&lt;/td>
&lt;td>Business Strategy &amp;amp; Risk&lt;/td>
&lt;td>0.614&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>49&lt;/td>
&lt;td>Mistakes in Berkshire Acquisitions&lt;/td>
&lt;td>Business Risk &amp;amp; Management&lt;/td>
&lt;td>0.745&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>50&lt;/td>
&lt;td>Berkshire’s Strategic Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>51&lt;/td>
&lt;td>Berkshire’s Strategic Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>52&lt;/td>
&lt;td>Berkshire’s Strategic Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>53&lt;/td>
&lt;td>Berkshire’s Mistakes &amp;amp; Strategic Challenges&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>54&lt;/td>
&lt;td>Mistakes in Berkshire Acquisitions&lt;/td>
&lt;td>Business Risk &amp;amp; Management&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>55&lt;/td>
&lt;td>Berkshire’s Strategic Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>56&lt;/td>
&lt;td>Berkshire’s Strategic Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>57&lt;/td>
&lt;td>Berkshire’s Strategic Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>58&lt;/td>
&lt;td>Pete Liegl&amp;rsquo;s Performance&lt;/td>
&lt;td>Business &amp;amp; Financial&lt;/td>
&lt;td>0.475&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>59&lt;/td>
&lt;td>GEICO Restructuring&lt;/td>
&lt;td>Business &amp;amp; Strategy&lt;/td>
&lt;td>0.46&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>60&lt;/td>
&lt;td>GEICO Transformation&lt;/td>
&lt;td>Business Strategy &amp;amp; Operational&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>61&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.536&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>62&lt;/td>
&lt;td>Property Casualty Pricing Surge&lt;/td>
&lt;td>Financial &amp;amp; Strategic&lt;/td>
&lt;td>0.522&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>63&lt;/td>
&lt;td>Convective Storm Damage&lt;/td>
&lt;td>Financial Risk &amp;amp; Strategic&lt;/td>
&lt;td>0.652&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>64&lt;/td>
&lt;td>Berkshire’s Recent Challenges&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.55&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>65&lt;/td>
&lt;td>Berkshire’s Recent Financial Challenges&lt;/td>
&lt;td>Financial &amp;amp; Strategic&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>66&lt;/td>
&lt;td>Insurance Losses &amp;amp; Strategic Risks&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.559&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>67&lt;/td>
&lt;td>Berkshire’s Strategic Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.747&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>68&lt;/td>
&lt;td>Berkshire’s Financial Performance&lt;/td>
&lt;td>Financial &amp;amp; Strategic&lt;/td>
&lt;td>0.74&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>69&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>70&lt;/td>
&lt;td>Berkshire’s Financial Performance&lt;/td>
&lt;td>Financial &amp;amp; Strategic&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>71&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>72&lt;/td>
&lt;td>Berkshire’s Strategic Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>73&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>74&lt;/td>
&lt;td>Berkshire’s Strategic Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>75&lt;/td>
&lt;td>Berkshire’s Recent Mistakes &amp;amp; Challenges&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.923&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>76&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>77&lt;/td>
&lt;td>Berkshire’s Recent Mistakes &amp;amp; Strategic Challenges&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>78&lt;/td>
&lt;td>Berkshire’s Recent Mistakes &amp;amp; Challenges&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>79&lt;/td>
&lt;td>Berkshire’s Recent Mistakes &amp;amp; Strategic Challenges&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.981&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>80&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>81&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>82&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>83&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>84&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>85&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>86&lt;/td>
&lt;td>Berkshire’s Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.982&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>87&lt;/td>
&lt;td>Berkshire’s Early Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>88&lt;/td>
&lt;td>Berkshire’s Early Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>89&lt;/td>
&lt;td>Berkshire’s Early Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>90&lt;/td>
&lt;td>Berkshire’s Strategic Missteps&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>91&lt;/td>
&lt;td>Berkshire’s Tax Burden&lt;/td>
&lt;td>Financial &amp;amp; Strategic&lt;/td>
&lt;td>0.697&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>92&lt;/td>
&lt;td>Berkshire’s Tax Burden&lt;/td>
&lt;td>Financial &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>93&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>94&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>95&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>96&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>97&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>98&lt;/td>
&lt;td>Berkshire’s Financial Challenges&lt;/td>
&lt;td>Financial &amp;amp; Strategic&lt;/td>
&lt;td>0.742&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>99&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.742&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>100&lt;/td>
&lt;td>Berkshire’s Financial Mistakes&lt;/td>
&lt;td>Financial &amp;amp; Strategic&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>101&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>102&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>103&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>104&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>105&lt;/td>
&lt;td>Berkshire’s Recent Mistakes &amp;amp; Challenges&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>106&lt;/td>
&lt;td>Berkshire’s Recent Mistakes &amp;amp; Challenges&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>107&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.964&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>108&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>109&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>110&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>111&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>112&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>113&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>114&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>115&lt;/td>
&lt;td>Berkshire’s Strategic Setbacks&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>116&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>117&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>118&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>119&lt;/td>
&lt;td>Berkshire’s Strategic Challenges&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>120&lt;/td>
&lt;td>Berkshire’s Strategic Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>121&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>122&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>123&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>124&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>125&lt;/td>
&lt;td>Berkshire’s Strategic Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>126&lt;/td>
&lt;td>Berkshire’s Strategic Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>127&lt;/td>
&lt;td>Berkshire’s Strategic Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>128&lt;/td>
&lt;td>Berkshire’s Strategic Challenges&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>129&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.851&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>130&lt;/td>
&lt;td>Berkshire’s Strategic Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>131&lt;/td>
&lt;td>Berkshire’s Strategic Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>132&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>133&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>134&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>135&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>136&lt;/td>
&lt;td>Berkshire’s Early Struggles&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>137&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>138&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>139&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>140&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>141&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>142&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>143&lt;/td>
&lt;td>Berkshire’s Recent Mistakes &amp;amp; Challenges&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>144&lt;/td>
&lt;td>Berkshire’s Early Struggles&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>145&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.86&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>146&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>147&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>148&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>149&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>150&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>151&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>152&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>153&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>154&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>155&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>156&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>157&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>158&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>159&lt;/td>
&lt;td>CEOs&amp;rsquo; Mistakes&lt;/td>
&lt;td>Business Strategy &amp;amp; Risk&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>160&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>161&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>162&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>163&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>164&lt;/td>
&lt;td>Berkshire’s Strategic Challenges&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.851&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>165&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>166&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>167&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>168&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>169&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>170&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>171&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>172&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>173&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>174&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>175&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>176&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>177&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>178&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>179&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>180&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>181&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>182&lt;/td>
&lt;td>Strategic Setbacks&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>183&lt;/td>
&lt;td>Recent Mistakes in Berkshire&amp;rsquo;s Operations&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.743&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>184&lt;/td>
&lt;td>Hurricane, Tornado, and Wildfire Risks&lt;/td>
&lt;td>Financial &amp;amp; Strategic&lt;/td>
&lt;td>0.547&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>185&lt;/td>
&lt;td>Strategic Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.55&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>186&lt;/td>
&lt;td>Auto Insurance Transition&lt;/td>
&lt;td>Financial &amp;amp; Strategic&lt;/td>
&lt;td>0.481&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>187&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.49&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>188&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>189&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>190&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>191&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>192&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>193&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>194&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>195&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>196&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>197&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>198&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>199&lt;/td>
&lt;td>Berkshire’s Strategic Shifts&lt;/td>
&lt;td>Business &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>200&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>201&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>202&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>203&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>204&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>205&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>206&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>207&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>208&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>209&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>210&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>211&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>212&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>213&lt;/td>
&lt;td>Berkshire’s Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>214&lt;/td>
&lt;td>Strategic Missteps&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>215&lt;/td>
&lt;td>Strategic Missteps&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>216&lt;/td>
&lt;td>Strategic Missteps&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>217&lt;/td>
&lt;td>Strategic Missteps&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>218&lt;/td>
&lt;td>Strategic Challenges&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.803&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>219&lt;/td>
&lt;td>Strategic Missteps&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>220&lt;/td>
&lt;td>Strategic Setbacks&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>221&lt;/td>
&lt;td>Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>222&lt;/td>
&lt;td>Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>223&lt;/td>
&lt;td>Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>224&lt;/td>
&lt;td>Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>225&lt;/td>
&lt;td>Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>226&lt;/td>
&lt;td>Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>227&lt;/td>
&lt;td>Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>228&lt;/td>
&lt;td>Strategic Missteps&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>229&lt;/td>
&lt;td>Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>230&lt;/td>
&lt;td>Strategic Mishaps&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>231&lt;/td>
&lt;td>Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>232&lt;/td>
&lt;td>Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>233&lt;/td>
&lt;td>Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>234&lt;/td>
&lt;td>Financial Risks &amp;amp; Challenges&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.671&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>235&lt;/td>
&lt;td>Recent Business Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.694&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>236&lt;/td>
&lt;td>Recent Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>237&lt;/td>
&lt;td>Strategic Challenges&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.678&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>238&lt;/td>
&lt;td>Strategic Challenges&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>239&lt;/td>
&lt;td>Recent Business Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.671&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>240&lt;/td>
&lt;td>Strategic Challenges&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.671&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>241&lt;/td>
&lt;td>Recent Business Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.671&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>242&lt;/td>
&lt;td>Strategic Challenges&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.671&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>243&lt;/td>
&lt;td>Recent Business Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.671&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>244&lt;/td>
&lt;td>Recent Business Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>245&lt;/td>
&lt;td>Recent Business Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>246&lt;/td>
&lt;td>Recent Business Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>247&lt;/td>
&lt;td>Recent Business Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>248&lt;/td>
&lt;td>Recent Business Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>249&lt;/td>
&lt;td>Recent Business Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>250&lt;/td>
&lt;td>Recent Business Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>251&lt;/td>
&lt;td>Recent Business Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>252&lt;/td>
&lt;td>Strategic Challenges&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.671&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>253&lt;/td>
&lt;td>Recent Business Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>0.671&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>254&lt;/td>
&lt;td>Recent Business Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>255&lt;/td>
&lt;td>Recent Business Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>256&lt;/td>
&lt;td>Recent Business Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>257&lt;/td>
&lt;td>Recent Business Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>258&lt;/td>
&lt;td>Recent Business Mistakes&lt;/td>
&lt;td>Business &amp;amp; Risk &amp;amp; Strategy&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="implementation-code">Implementation Code&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> re
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> json
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> ollama
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> logging
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> numpy &lt;span style="color:#f00">as&lt;/span> np
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> typing &lt;span style="color:#f00">import&lt;/span> List
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> pathlib &lt;span style="color:#f00">import&lt;/span> Path
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> dataclasses &lt;span style="color:#f00">import&lt;/span> dataclass
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> pydantic &lt;span style="color:#f00">import&lt;/span> BaseModel, Field
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> sklearn.metrics.pairwise &lt;span style="color:#f00">import&lt;/span> cosine_similarity
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>logging.basicConfig(level=logging.DEBUG)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>log = logging.getLogger(&lt;span style="color:#eedd82">__name__&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Dont show logging for httpcore or httpx&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>logging.getLogger(&lt;span style="color:#87ceeb">&amp;#34;httpcore&amp;#34;&lt;/span>).setLevel(logging.WARNING)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>logging.getLogger(&lt;span style="color:#87ceeb">&amp;#34;httpx&amp;#34;&lt;/span>).setLevel(logging.WARNING)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>logging.getLogger(&lt;span style="color:#87ceeb">&amp;#34;pdfminer&amp;#34;&lt;/span>).setLevel(logging.WARNING)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>logging.getLogger(&lt;span style="color:#87ceeb">&amp;#34;urllib3&amp;#34;&lt;/span>).setLevel(logging.WARNING)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@dataclass
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> ChunkingConfig:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> max_chunk_size: int = &lt;span style="color:#f60">1000&lt;/span> &lt;span style="color:#0f0"># characters&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> min_chunk_size: int = &lt;span style="color:#f60">100&lt;/span> &lt;span style="color:#0f0"># characters&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> similarity_threshold: float = &lt;span style="color:#f60">0.15&lt;/span> &lt;span style="color:#0f0"># cosine distance threshold for splitting (higher means more splits)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ollama_model: str = &lt;span style="color:#87ceeb">&amp;#34;nomic-embed-text:v1.5&amp;#34;&lt;/span> &lt;span style="color:#0f0"># Ollama model for embeddings&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> overlap_sentences: int = &lt;span style="color:#f60">1&lt;/span> &lt;span style="color:#0f0"># sentences to overlap between chunks&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@dataclass
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> SectioningConfig:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Model settings&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ollama_model: str = &lt;span style="color:#87ceeb">&amp;#34;gemma3:1b&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> max_tokens: int = &lt;span style="color:#f60">200&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> temperature: float = &lt;span style="color:#f60">0.1&lt;/span> &lt;span style="color:#0f0"># Low for consistency&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Similarity thresholds&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> section_merge_threshold: float = &lt;span style="color:#f60">0.75&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> title_weight: float = &lt;span style="color:#f60">0.6&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> category_weight: float = &lt;span style="color:#f60">0.4&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Size constraints&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> max_section_size: int = &lt;span style="color:#f60">3000&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> min_section_size: int = &lt;span style="color:#f60">300&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> SimpleChunkMetadata(BaseModel):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> title: str = Field(..., description=&lt;span style="color:#87ceeb">&amp;#34;Concise descriptive title for this content&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> category: str = Field(..., description=&lt;span style="color:#87ceeb">&amp;#34;Broad topic category&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> Subsection(BaseModel):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> subsection_title: str
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chunks: List[str]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> combined_content: str
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chunk_count: int
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_length: int
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> DocumentSection(BaseModel):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> section_title: str
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> category: str
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> subsections: List[Subsection]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_chunks: int
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_length: int
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> SemanticChunker:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__init__&lt;/span>(self, config: ChunkingConfig):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.config = config
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_embedding&lt;/span>(self, text: str) -&amp;gt; np.ndarray:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Get embedding from Ollama using the Python client&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Ensure the text is not empty, as empty strings might cause issues with embeddings&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not text.strip():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> np.zeros(&lt;span style="color:#f60">768&lt;/span>) &lt;span style="color:#0f0"># Return a zero vector for empty text, assuming 768 dimensions for nomic-embed-text&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> response = ollama.embeddings(model=self.config.ollama_model, prompt=text)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> np.array(response[&lt;span style="color:#87ceeb">&amp;#34;embedding&amp;#34;&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">split_into_sentences&lt;/span>(self, text: str) -&amp;gt; List[str]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Split text into sentences using regex&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Improved regex to handle common sentence endings while avoiding splitting on abbreviations (e.g., Mr. Smith)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># It looks for . ! or ? followed by whitespace and an uppercase letter, but not if preceded by a common abbreviation pattern.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># This is a common pattern for sentence splitting.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sentence_pattern = &lt;span style="color:#87ceeb">r&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;(?&amp;lt;!\b[A-Z]\.)(?&amp;lt;!\b[A-Z][a-z]\.)(?&amp;lt;=[.!?])\s+(?=[A-Z])&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sentences = re.split(sentence_pattern, text)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> [s.strip() &lt;span style="color:#f00">for&lt;/span> s in sentences &lt;span style="color:#f00">if&lt;/span> s.strip()]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">chunk_sentences&lt;/span>(self, sentences: List[str]) -&amp;gt; List[str]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Chunks sentences based on semantic similarity, respecting max_chunk_size,
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> and adding overlap.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not sentences:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chunks: List[str] = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_chunk_sentences: List[str] = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_chunk_char_length = &lt;span style="color:#f60">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Get embeddings for all sentences&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sentence_embeddings = [self.get_embedding(s) &lt;span style="color:#f00">for&lt;/span> s in sentences]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i, sentence in enumerate(sentences):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sentence_length = len(sentence)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Check if adding the current sentence would exceed max_chunk_size&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># and if we have enough content to form a valid chunk&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> current_chunk_char_length + sentence_length &amp;gt; self.config.max_chunk_size and current_chunk_sentences:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># If we&amp;#39;re at the beginning of a potential new chunk and the first sentence itself is too long,&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># we&amp;#39;ll add it as a standalone chunk and handle it.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> current_chunk_char_length == &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chunks.append(sentence)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_chunk_sentences = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_chunk_char_length = &lt;span style="color:#f60">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">continue&lt;/span> &lt;span style="color:#0f0"># Move to the next sentence&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Finalize the current chunk&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chunks.append(&lt;span style="color:#87ceeb">&amp;#34; &amp;#34;&lt;/span>.join(current_chunk_sentences))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Reset for the next chunk, adding overlap&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_chunk_sentences = sentences[max(&lt;span style="color:#f60">0&lt;/span>, i - self.config.overlap_sentences) : i]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_chunk_char_length = sum(len(s) &lt;span style="color:#f00">for&lt;/span> s in current_chunk_sentences)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Add sentence to current chunk&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_chunk_sentences.append(sentence)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_chunk_char_length += sentence_length
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Semantic split check (only if there&amp;#39;s more than one sentence in current_chunk_sentences)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(current_chunk_sentences) &amp;gt; &lt;span style="color:#f60">1&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Compare the last two sentences in the current_chunk_sentences&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># We&amp;#39;re interested in the similarity between the last added sentence and the one before it&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># to detect a potential topic shift at the boundary.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> embed1 = sentence_embeddings[i - &lt;span style="color:#f60">1&lt;/span>] &lt;span style="color:#0f0"># Embedding of the sentence before the current one&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> embed2 = sentence_embeddings[i] &lt;span style="color:#0f0"># Embedding of the current sentence&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Calculate cosine distance (1 - cosine_similarity)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># A higher distance means less similarity.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> np.dot(embed1, embed2) == &lt;span style="color:#f60">0&lt;/span> and np.linalg.norm(embed1) == &lt;span style="color:#f60">0&lt;/span> and np.linalg.norm(embed2) == &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Handle case where both embeddings are zero vectors (e.g., from empty strings), distance is 0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> distance = &lt;span style="color:#f60">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> np.linalg.norm(embed1) == &lt;span style="color:#f60">0&lt;/span> or np.linalg.norm(embed2) == &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># If one is zero and other is not, they are dissimilar&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> distance = &lt;span style="color:#f60">1.0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> distance = &lt;span style="color:#f60">1&lt;/span> - cosine_similarity(embed1.reshape(&lt;span style="color:#f60">1&lt;/span>, -&lt;span style="color:#f60">1&lt;/span>), embed2.reshape(&lt;span style="color:#f60">1&lt;/span>, -&lt;span style="color:#f60">1&lt;/span>))[&lt;span style="color:#f60">0&lt;/span>][&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> distance &amp;gt; self.config.similarity_threshold:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> log.debug(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Semantic split detected between sentences &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>i - &lt;span style="color:#f60">1&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> and &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>i&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> with distance &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>distance&lt;span style="color:#87ceeb">:&lt;/span>&lt;span style="color:#87ceeb">.4f&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Semantic split detected!&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># If the chunk is long enough, finalize it before the current sentence.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> current_chunk_char_length - sentence_length &amp;gt;= self.config.min_chunk_size:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Append chunk excluding the current sentence, which starts a new one&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chunks.append(&lt;span style="color:#87ceeb">&amp;#34; &amp;#34;&lt;/span>.join(current_chunk_sentences[:-&lt;span style="color:#f60">1&lt;/span>]))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Reset for the next chunk, adding overlap&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_chunk_sentences = sentences[max(&lt;span style="color:#f60">0&lt;/span>, i - self.config.overlap_sentences) : i + &lt;span style="color:#f60">1&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_chunk_char_length = sum(len(s) &lt;span style="color:#f00">for&lt;/span> s in current_chunk_sentences)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># If the chunk is too short, we don&amp;#39;t split yet and continue to build it.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># This prevents very small chunks due to minor semantic shifts.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Add any remaining sentences as the last chunk&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> current_chunk_sentences:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chunks.append(&lt;span style="color:#87ceeb">&amp;#34; &amp;#34;&lt;/span>.join(current_chunk_sentences))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Post-processing: Ensure no chunks are empty and optionally merge very small chunks&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> final_chunks = [chunk &lt;span style="color:#f00">for&lt;/span> chunk in chunks &lt;span style="color:#f00">if&lt;/span> chunk.strip()]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> final_chunks
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> SectionGrouper:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__init__&lt;/span>(self, sectioning_config: SectioningConfig, embedding_model: str = &lt;span style="color:#87ceeb">&amp;#34;nomic-embed-text:v1.5&amp;#34;&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.config = sectioning_config
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.embedding_model = embedding_model
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">generate_chunk_metadata&lt;/span>(self, chunk: str, previous_metadata) -&amp;gt; SimpleChunkMetadata:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Generate structured metadata for a chunk using Ollama&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Prepare previous metadata as a string for context&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> metadata = &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(previous_metadata) &amp;gt; &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> metadata = (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">\n&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>.join(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>i + &lt;span style="color:#f60">1&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">. Title: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>meta.title&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">, Category: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>meta.category&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i, (chunk, meta) in enumerate(previous_metadata)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> previous_metadata
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> metadata != &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> metadata = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Previous metadata:&lt;/span>&lt;span style="color:#87ceeb">\n&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>metadata&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">\n\n&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> prompt = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Analyze the following text and provide structured metadata:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">Text: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>chunk&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>metadata&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">Generate a JSON response with:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">- title: A concise 3-8 word descriptive title
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">- category: A broad 1-3 word topic category
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">Be specific and descriptive but concise. Keep consistency in section titles and categories across similar content.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> response = ollama.generate(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> model=self.config.ollama_model,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> prompt=prompt,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> format=SimpleChunkMetadata.model_json_schema(),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> options={&lt;span style="color:#87ceeb">&amp;#34;temperature&amp;#34;&lt;/span>: self.config.temperature, &lt;span style="color:#87ceeb">&amp;#34;num_predict&amp;#34;&lt;/span>: self.config.max_tokens},
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keep_alive=&lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Parse the JSON response&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> metadata_dict = json.loads(response[&lt;span style="color:#87ceeb">&amp;#34;response&amp;#34;&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> SimpleChunkMetadata(**metadata_dict)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> Exception &lt;span style="color:#f00">as&lt;/span> e:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> log.warning(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Failed to generate metadata for chunk, using fallback: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>e&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Fallback metadata&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> SimpleChunkMetadata(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> title=&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Section &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>hash(chunk[:&lt;span style="color:#f60">100&lt;/span>]) % &lt;span style="color:#f60">1000&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> category=&lt;span style="color:#87ceeb">&amp;#34;General&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_embedding&lt;/span>(self, text: str) -&amp;gt; np.ndarray:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Get embedding for text&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not text.strip():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> np.zeros(&lt;span style="color:#f60">768&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> response = ollama.embeddings(model=self.embedding_model, prompt=text)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> np.array(response[&lt;span style="color:#87ceeb">&amp;#34;embedding&amp;#34;&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">calculate_similarity&lt;/span>(self, metadata1: SimpleChunkMetadata, metadata2: SimpleChunkMetadata) -&amp;gt; float:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Calculate weighted similarity between two chunk metadata objects&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Get embeddings for each field&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> title1_emb = self.get_embedding(metadata1.title)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> title2_emb = self.get_embedding(metadata2.title)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> category1_emb = self.get_embedding(metadata1.category)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> category2_emb = self.get_embedding(metadata2.category)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Calculate cosine similarities&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">safe_cosine_similarity&lt;/span>(emb1, emb2):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> np.linalg.norm(emb1) == &lt;span style="color:#f60">0&lt;/span> or np.linalg.norm(emb2) == &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f60">0.0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> cosine_similarity(emb1.reshape(&lt;span style="color:#f60">1&lt;/span>, -&lt;span style="color:#f60">1&lt;/span>), emb2.reshape(&lt;span style="color:#f60">1&lt;/span>, -&lt;span style="color:#f60">1&lt;/span>))[&lt;span style="color:#f60">0&lt;/span>][&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> title_sim = safe_cosine_similarity(title1_emb, title2_emb)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> category_sim = safe_cosine_similarity(category1_emb, category2_emb)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Weighted similarity&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> weighted_sim = self.config.title_weight * title_sim + self.config.category_weight * category_sim
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> weighted_sim
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">group_chunks_into_sections&lt;/span>(self, chunks: List[str]) -&amp;gt; List[DocumentSection]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Group chunks into coherent sections based on metadata similarity&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not chunks:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> log.info(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Generating metadata for &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>len(chunks)&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> chunks...&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Stage 1: Generate metadata for all chunks&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chunk_metadata = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i, chunk in enumerate(chunks):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># We pass in the last 5 metadata items as context for the next chunk&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> metadata = self.generate_chunk_metadata(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chunk, chunk_metadata[:-&lt;span style="color:#f60">5&lt;/span>] &lt;span style="color:#f00">if&lt;/span> len(chunk_metadata) &amp;gt;= &lt;span style="color:#f60">5&lt;/span> &lt;span style="color:#f00">else&lt;/span> chunk_metadata
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chunk_metadata.append((chunk, metadata))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> log.debug(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Chunk &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>i + &lt;span style="color:#f60">1&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>metadata.title&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> | &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>metadata.category&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> log.info(&lt;span style="color:#87ceeb">&amp;#34;Grouping chunks into sections...&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Stage 2: Group chunks based on similarity&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sections = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_section_chunks = [(chunk_metadata[&lt;span style="color:#f60">0&lt;/span>][&lt;span style="color:#f60">0&lt;/span>], chunk_metadata[&lt;span style="color:#f60">0&lt;/span>][&lt;span style="color:#f60">1&lt;/span>])]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_section_category = chunk_metadata[&lt;span style="color:#f60">0&lt;/span>][&lt;span style="color:#f60">1&lt;/span>].category
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i in range(&lt;span style="color:#f60">1&lt;/span>, len(chunk_metadata)):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chunk, metadata = chunk_metadata[i]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> prev_metadata = chunk_metadata[i - &lt;span style="color:#f60">1&lt;/span>][&lt;span style="color:#f60">1&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Calculate similarity with previous chunk&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> similarity = self.calculate_similarity(metadata, prev_metadata)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Check if we should start a new section&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> should_split = (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Similarity is below threshold so the current chunk is likely different&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># from the previous section&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> similarity &amp;lt; self.config.section_merge_threshold
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Or if the current section is too long&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> or sum(len(c[&lt;span style="color:#f60">0&lt;/span>]) &lt;span style="color:#f00">for&lt;/span> c in current_section_chunks) + len(chunk) &amp;gt; self.config.max_section_size
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> should_split and len(current_section_chunks) &amp;gt; &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Finalize current section&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> section = self._create_section(current_section_chunks, current_section_category)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sections.append(section)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Start new section&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_section_chunks = [(chunk, metadata)]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_section_category = metadata.category
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> log.debug(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;New section started at chunk &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>i + &lt;span style="color:#f60">1&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">, similarity: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>similarity&lt;span style="color:#87ceeb">:&lt;/span>&lt;span style="color:#87ceeb">.3f&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Add to current section&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_section_chunks.append((chunk, metadata))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Add final section&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> current_section_chunks:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> section = self._create_section(current_section_chunks, current_section_category)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sections.append(section)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> log.info(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Created &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>len(sections)&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> sections from &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>len(chunks)&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> chunks&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> sections
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_create_section&lt;/span>(self, section_chunks: List[tuple], category: str) -&amp;gt; DocumentSection:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Create a DocumentSection from a list of (chunk, metadata) tuples&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Group chunks into subsections based on title similarity&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> subsections = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_subsection = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_title = section_chunks[&lt;span style="color:#f60">0&lt;/span>][&lt;span style="color:#f60">1&lt;/span>].title
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> chunk, metadata in section_chunks:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not current_subsection:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_subsection = [(chunk, metadata)]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_title = metadata.title
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Check if we should group with current subsection&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> title_sim = self.calculate_similarity(metadata, current_subsection[-&lt;span style="color:#f60">1&lt;/span>][&lt;span style="color:#f60">1&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> title_sim &amp;gt; &lt;span style="color:#f60">0.8&lt;/span>: &lt;span style="color:#0f0"># High threshold for subsection grouping&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_subsection.append((chunk, metadata))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Create subsection from current group&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> subsection = self._create_subsection(current_subsection, current_title)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> subsections.append(subsection)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Start new subsection&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_subsection = [(chunk, metadata)]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_title = metadata.title
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Add final subsection&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> current_subsection:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> subsection = self._create_subsection(current_subsection, current_title)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> subsections.append(subsection)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Create section title from most common category and representative title&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> section_title = self._generate_section_title(section_chunks)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> DocumentSection(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> section_title=section_title,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> category=category,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> subsections=subsections,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_chunks=len(section_chunks),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_length=sum(len(chunk) &lt;span style="color:#f00">for&lt;/span> chunk, _ in section_chunks),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_create_subsection&lt;/span>(self, subsection_chunks: List[tuple], title: str) -&amp;gt; Subsection:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Create a Subsection from a list of (chunk, metadata) tuples&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chunks = [chunk &lt;span style="color:#f00">for&lt;/span> chunk, _ in subsection_chunks]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> combined_content = &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">\n\n&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>.join(chunks)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> Subsection(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> subsection_title=title,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chunks=chunks,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> combined_content=combined_content,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chunk_count=len(chunks),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_length=len(combined_content),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_generate_section_title&lt;/span>(self, section_chunks: List[tuple]) -&amp;gt; str:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Generate a representative title for the section&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Use the title from the first chunk, or create from category&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> section_chunks:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> first_metadata = section_chunks[&lt;span style="color:#f60">0&lt;/span>][&lt;span style="color:#f60">1&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(section_chunks) == &lt;span style="color:#f60">1&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> first_metadata.title
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># For multi-chunk sections, use category-based title&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>first_metadata.category&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> Overview&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">&amp;#34;Untitled Section&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">pdf_to_markdown&lt;/span>(pdf_file: Path) -&amp;gt; str:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">from&lt;/span> markitdown &lt;span style="color:#f00">import&lt;/span> MarkItDown
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> md = MarkItDown()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> result = md.convert(pdf_file)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> text_content = result.text_content
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Keep only ascii characters&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> text_content = &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&lt;/span>.join(c &lt;span style="color:#f00">for&lt;/span> c in text_content &lt;span style="color:#f00">if&lt;/span> ord(c) &amp;lt; &lt;span style="color:#f60">128&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Remove leading and trailing whitespace&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> text_content = text_content.strip()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Remove multiple newlines&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> text_content = &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">\n&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>.join(line.strip() &lt;span style="color:#f00">for&lt;/span> line in text_content.splitlines() &lt;span style="color:#f00">if&lt;/span> line.strip())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> text_content
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_pdf&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> requests
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Sample text chunks&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> url = &lt;span style="color:#87ceeb">&amp;#34;https://www.berkshirehathaway.com/letters/2024ltr.pdf&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> response = requests.get(url)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> response.status_code != &lt;span style="color:#f60">200&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ValueError(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Failed to download PDF: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>response.status_code&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pdf_file = Path(&lt;span style="color:#87ceeb">&amp;#34;./example_cache/2024ltr.pdf&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pdf_file.parent.mkdir(parents=&lt;span style="color:#f00">True&lt;/span>, exist_ok=&lt;span style="color:#f00">True&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pdf_file.write_bytes(response.content)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Convert PDF to markdown text&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> text = pdf_to_markdown(pdf_file)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> text
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Usage example&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">main&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Example usage&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sample_text = get_pdf()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Initialize chunker&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chunking_config = ChunkingConfig(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> max_chunk_size=&lt;span style="color:#f60">750&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> min_chunk_size=&lt;span style="color:#f60">150&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> similarity_threshold=&lt;span style="color:#f60">0.25&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ollama_model=&lt;span style="color:#87ceeb">&amp;#34;nomic-embed-text:v1.5&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> overlap_sentences=&lt;span style="color:#f60">2&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Initialize section grouper&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sectioning_config = SectioningConfig(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ollama_model=&lt;span style="color:#87ceeb">&amp;#34;gemma3:1b&amp;#34;&lt;/span>, section_merge_threshold=&lt;span style="color:#f60">0.75&lt;/span>, max_section_size=&lt;span style="color:#f60">3000&lt;/span>, min_section_size=&lt;span style="color:#f60">300&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Stage 1: Semantic chunking&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chunker = SemanticChunker(chunking_config)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sentences = chunker.split_into_sentences(sample_text)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> chunks = chunker.chunk_sentences(sentences)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Stage 1 Complete - Generated &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>len(chunks)&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> semantic chunks&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Stage 2: Section grouping&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> grouper = SectionGrouper(sectioning_config)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sections = grouper.group_chunks_into_sections(chunks)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Write sections to JSON file&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> output_file = Path(&lt;span style="color:#87ceeb">&amp;#34;./example_cache/sections.json&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> output_file.parent.mkdir(parents=&lt;span style="color:#f00">True&lt;/span>, exist_ok=&lt;span style="color:#f00">True&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> output_file.open(&lt;span style="color:#87ceeb">&amp;#34;w&amp;#34;&lt;/span>, encoding=&lt;span style="color:#87ceeb">&amp;#34;utf-8&amp;#34;&lt;/span>) &lt;span style="color:#f00">as&lt;/span> f:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> json.dump([section.model_dump() &lt;span style="color:#f00">for&lt;/span> section in sections], f, indent=&lt;span style="color:#f60">2&lt;/span>, ensure_ascii=&lt;span style="color:#f00">False&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> log.info(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Sections written to &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>output_file&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Stage 2 Complete - Generated &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>len(sections)&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> sections&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Display results&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">\n&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span> + &lt;span style="color:#87ceeb">&amp;#34;=&amp;#34;&lt;/span> * &lt;span style="color:#f60">80&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#87ceeb">&amp;#34;DOCUMENT STRUCTURE&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#87ceeb">&amp;#34;=&amp;#34;&lt;/span> * &lt;span style="color:#f60">80&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i, section in enumerate(sections, &lt;span style="color:#f60">1&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">\n&lt;/span>&lt;span style="color:#87ceeb">SECTION &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>i&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>section.section_title&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Category: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>section.category&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Total chunks: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>section.total_chunks&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> | Length: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>section.total_length&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> chars&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#87ceeb">&amp;#34;-&amp;#34;&lt;/span> * &lt;span style="color:#f60">60&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> j, subsection in enumerate(section.subsections, &lt;span style="color:#f60">1&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34; &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>i&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">.&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>j&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>subsection.subsection_title&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34; Chunks: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>subsection.chunk_count&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> | Length: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>subsection.total_length&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> chars&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Show first chunk preview&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> subsection.chunks:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> preview = (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> subsection.chunks[&lt;span style="color:#f60">0&lt;/span>][:&lt;span style="color:#f60">200&lt;/span>] + &lt;span style="color:#87ceeb">&amp;#34;...&amp;#34;&lt;/span> &lt;span style="color:#f00">if&lt;/span> len(subsection.chunks[&lt;span style="color:#f60">0&lt;/span>]) &amp;gt; &lt;span style="color:#f60">200&lt;/span> &lt;span style="color:#f00">else&lt;/span> subsection.chunks[&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34; Preview: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>preview&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print()
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Python Dependency Injection</title><link>https://asifr.com/python-dependency-injection/</link><pubDate>Sun, 15 Jun 2025 08:14:56 -0400</pubDate><guid>https://asifr.com/python-dependency-injection/</guid><description>
&lt;p>Dependency Injection (DI) is a software design pattern that lets you pass instances of a service rather than creating them directly within a class or function. The &lt;a href="https://pypi.org/project/fastapi/">FastAPI&lt;/a> framework provides a neat way to pass dependencies using Python&amp;rsquo;s type hints and the &lt;code>Depends&lt;/code> function. The &lt;a href="https://pypi.org/project/fast-depends/">fast-depends&lt;/a> package extracts the FastAPI code and strips out all the web framework-specific code into a small library that can be used in any Python project.&lt;/p>
&lt;p>Here is an example of how to use dependency injection using the custom &lt;a href="#implementation-code">single-file implementation&lt;/a> of dependency injection at the bottom of this page.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> typing &lt;span style="color:#f00">as&lt;/span> t
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_settings&lt;/span>() -&amp;gt; Settings:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> Settings()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@inject
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_db&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> settings: t.Annotated[Settings, Depends(get_settings)],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; DatabaseConnection:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> db = DatabaseConnection(settings)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> db
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@inject
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">compute_something&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> db: t.Annotated[DatabaseConnection, Depends(get_db)]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Use db connection here&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">pass&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Call the function with dependencies injected&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>result = compute_something() &lt;span style="color:#0f0"># Automatically resolves dependencies&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>db = get_db() &lt;span style="color:#0f0"># You can also call the dependency directly&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>result = compute_something(db=db) &lt;span style="color:#0f0"># Pass the db directly if needed&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;code>get_*&lt;/code> functions return instances of the required services, and the &lt;code>Depends&lt;/code> function is used to declare dependencies. The &lt;code>inject&lt;/code> decorator enables dependency injection for the function by automatically resolving the dependencies when the function is called. The default behavior is to cache the results of dependencies, so if a dependency is called multiple times, it will return the cached result instead of executing the function again (e.g. Settings and DB connection are created once and reused).&lt;/p>
&lt;p>This pattern allows for better separation of concerns (instantiatiating dependencies outside of the function) and makes unit testing easier. For example, we can create different implementations of the database and settings classes and pass them to the &lt;code>compute_something&lt;/code> function without changing its signature.&lt;/p>
&lt;p>Below is the full implementation of dependency injection as a standalone Python module. You can copy it into a &lt;code>depends.py&lt;/code> file and use it in your projects.&lt;/p>
&lt;p>Key Components:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="#simple-dependency-injection">&lt;code>Depends&lt;/code>&lt;/a>: Marks a parameter as a dependency to be injected&lt;/li>
&lt;li>&lt;a href="#async-dependencies">&lt;code>inject&lt;/code>&lt;/a>: Decorator that enables dependency injection for a sync or async function&lt;/li>
&lt;li>&lt;a href="#custom-fields">&lt;code>CustomField&lt;/code>&lt;/a>: Base class for creating custom parameter extractors&lt;/li>
&lt;li>&lt;a href="#dependency-overrides">&lt;code>dependency_provider&lt;/code>&lt;/a>: Global provider for managing dependency overrides&lt;/li>
&lt;/ul>
&lt;p>Features:&lt;/p>
&lt;ul>
&lt;li>Automatic dependency resolution and injection&lt;/li>
&lt;li>Support for both sync and async functions&lt;/li>
&lt;li>Dependency caching (can be disabled per dependency)&lt;/li>
&lt;li>Type validation and casting using Pydantic&lt;/li>
&lt;li>Context manager support for resource management&lt;/li>
&lt;li>Custom field extractors for complex parameter handling&lt;/li>
&lt;li>Dependency override system for testing and configuration&lt;/li>
&lt;/ul>
&lt;p>Table of Contents:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="#usage">Usage&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#simple-dependency-injection">Simple Dependency Injection&lt;/a>&lt;/li>
&lt;li>&lt;a href="#async-dependencies">Async Dependencies&lt;/a>&lt;/li>
&lt;li>&lt;a href="#dependency-caching">Dependency Caching&lt;/a>&lt;/li>
&lt;li>&lt;a href="#disable-caching">Disable Caching&lt;/a>&lt;/li>
&lt;li>&lt;a href="#custom-fields">Custom Fields&lt;/a>&lt;/li>
&lt;li>&lt;a href="#dependency-overrides">Dependency Overrides&lt;/a>&lt;/li>
&lt;li>&lt;a href="#generator-dependencies-context-managers">Generator Dependencies (Context Managers)&lt;/a>&lt;/li>
&lt;li>&lt;a href="#type-validation-with-pydantic">Type Validation with Pydantic&lt;/a>&lt;/li>
&lt;li>&lt;a href="#disable-type-casting">Disable Type Casting&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#implementation-code">Implementation Code&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="usage">Usage&lt;/h2>
&lt;h3 id="simple-dependency-injection">Simple Dependency Injection&lt;/h3>
&lt;p>Dependencies can be injected into functions using the &lt;code>Depends&lt;/code> class and the &lt;code>inject&lt;/code> decorator.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_database&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">&amp;#34;database_connection&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_user&lt;/span>(db: str = Depends(get_database)):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;user_from_&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>db&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@inject
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">handler&lt;/span>(user: str = Depends(get_user)):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Hello, &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>user&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">!&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>result = handler() &lt;span style="color:#0f0"># &amp;#34;Hello, user_from_database_connection!&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>print(result)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="async-dependencies">Async Dependencies&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">async&lt;/span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_async_db&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">await&lt;/span> asyncio.sleep(&lt;span style="color:#f60">0.1&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">&amp;#34;async_database&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@inject
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">async&lt;/span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">async_handler&lt;/span>(db: str = Depends(get_async_db)):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;DB: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>db&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>result = &lt;span style="color:#f00">await&lt;/span> async_handler() &lt;span style="color:#0f0"># &amp;#34;DB: async_database&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>print(result)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="dependency-caching">Dependency Caching&lt;/h3>
&lt;p>Caching is enabled by default, meaning that if a dependency is called multiple times within the same request, it will return the cached result instead of executing the function again.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>call_count = &lt;span style="color:#f60">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">expensive_operation&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">global&lt;/span> call_count
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> call_count += &lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;result_&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>call_count&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@inject
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">handler&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> a: str = Depends(expensive_operation),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> b: str = Depends(expensive_operation), &lt;span style="color:#0f0"># Same result due to caching&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>a&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">, &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>b&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>result = handler() &lt;span style="color:#0f0"># &amp;#34;result_1, result_1&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>print(result) &lt;span style="color:#0f0"># Output: &amp;#34;result_1, result_1&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="disable-caching">Disable Caching&lt;/h3>
&lt;p>The use_cache parameter can be set to False to disable caching for specific dependencies.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>@inject
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">handler&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> a: str = Depends(expensive_operation, use_cache=&lt;span style="color:#f00">False&lt;/span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> b: str = Depends(expensive_operation, use_cache=&lt;span style="color:#f00">False&lt;/span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>a&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">, &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>b&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>result = handler() &lt;span style="color:#0f0"># &amp;#34;result_1, result_2&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>print(result) &lt;span style="color:#0f0"># Output: &amp;#34;result_1, result_2&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="custom-fields">Custom Fields&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> HeaderExtractor(CustomField):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__init__&lt;/span>(self, header_name: str):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> super().&lt;span style="color:#ff0">__init__&lt;/span>()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.header_name = header_name
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">use&lt;/span>(self, **kwargs: t.Any) -&amp;gt; t.Dict[str, t.Any]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Extract from some global context&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kwargs[self.param_name] = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;header_value_for_&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>self.header_name&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> kwargs
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@inject
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">api_handler&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> auth: str = HeaderExtractor(&lt;span style="color:#87ceeb">&amp;#34;Authorization&amp;#34;&lt;/span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> content_type: str = HeaderExtractor(&lt;span style="color:#87ceeb">&amp;#34;Content-Type&amp;#34;&lt;/span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> {&lt;span style="color:#87ceeb">&amp;#34;auth&amp;#34;&lt;/span>: auth, &lt;span style="color:#87ceeb">&amp;#34;content_type&amp;#34;&lt;/span>: content_type}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="dependency-overrides">Dependency Overrides&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">original_dep&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">&amp;#34;original&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">override_dep&lt;/span>():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">&amp;#34;overridden&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@inject
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">handler&lt;/span>(value: str = Depends(original_dep)):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Override dependency&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>dependency_provider.override(original_dep, override_dep)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>result = handler() &lt;span style="color:#0f0"># &amp;#34;overridden&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Clear overrides&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>dependency_provider.clear()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>result = handler() &lt;span style="color:#0f0"># &amp;#34;original&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="generator-dependencies-context-managers">Generator Dependencies (Context Managers)&lt;/h3>
&lt;p>A dependency can be a generator function, which allows for resource management (like opening and closing database connections).
In this example, the &lt;code>database_session&lt;/code> function is a context manager that opens a database connection and closes it after use.
The &lt;code>Depends&lt;/code> decorator will handle the context management automatically.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">database_session&lt;/span>() -&amp;gt; t.Generator[str, &lt;span style="color:#f00">None&lt;/span>, &lt;span style="color:#f00">None&lt;/span>]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#87ceeb">&amp;#34;Opening connection&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">yield&lt;/span> &lt;span style="color:#87ceeb">&amp;#34;db_session&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#87ceeb">&amp;#34;Closing connection&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@inject
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">handler&lt;/span>(db: str = Depends(database_session)):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Using &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>db&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>result = handler()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>print(result) &lt;span style="color:#0f0"># &amp;#34;Using db_session&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Output: Opening connection&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Output: Closing connection&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Returns: &amp;#34;Using db_session&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="type-validation-with-pydantic">Type Validation with Pydantic&lt;/h3>
&lt;p>Arguments can be annotated with Pydantic models for automatic validation and casting. In this example, the &lt;code>get_user_id&lt;/code> function returns a string, but it will be cast to an integer when injected. The &lt;code>Depends&lt;/code> decorator will handle the type casting automatically if &lt;code>cast=True&lt;/code> is set (default behavior). The &lt;code>Annotated&lt;/code> type from &lt;code>typing_extensions&lt;/code> is used to specify the type and the dependency.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> typing &lt;span style="color:#f00">import&lt;/span> Annotated
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_user_id&lt;/span>() -&amp;gt; int:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">&amp;#34;123&amp;#34;&lt;/span> &lt;span style="color:#0f0"># Wrong type!&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@inject
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">handler&lt;/span>(user_id: Annotated[int, Depends(get_user_id)]):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;User ID: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>user_id&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>result = handler() &lt;span style="color:#0f0"># user_id will be cast to int(123)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="disable-type-casting">Disable Type Casting&lt;/h3>
&lt;p>If you want to disable type casting for a specific dependency, you can set &lt;code>cast=False&lt;/code> in the &lt;code>inject&lt;/code> decorator.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>@inject(cast=&lt;span style="color:#f00">False&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">handler&lt;/span>(user_id: Annotated[int, Depends(get_user_id)]):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;User ID type: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>type(user_id)&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>result = handler() &lt;span style="color:#0f0"># user_id remains as string &amp;#34;123&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="implementation-code">Implementation Code&lt;/h2>
&lt;p>Requirements:&lt;/p>
&lt;ul>
&lt;li>Python 3.11+&lt;/li>
&lt;li>&lt;code>anyio&lt;/code> for async I/O operations&lt;/li>
&lt;li>&lt;code>pydantic&lt;/code> for data validation and settings management&lt;/li>
&lt;li>&lt;code>typing_extensions&lt;/code> for type annotations&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> anyio
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> asyncio
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> inspect
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> functools
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> typing &lt;span style="color:#f00">as&lt;/span> t
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> abc &lt;span style="color:#f00">import&lt;/span> ABC
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> copy &lt;span style="color:#f00">import&lt;/span> deepcopy
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> itertools &lt;span style="color:#f00">import&lt;/span> chain
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> pydantic &lt;span style="color:#f00">import&lt;/span> ConfigDict
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> collections &lt;span style="color:#f00">import&lt;/span> namedtuple
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> functools &lt;span style="color:#f00">import&lt;/span> wraps, partial
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> pydantic &lt;span style="color:#f00">import&lt;/span> BaseModel, create_model
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> typing_extensions &lt;span style="color:#f00">import&lt;/span> Annotated, ParamSpec, get_args, get_origin
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> pydantic._internal._typing_extra &lt;span style="color:#f00">import&lt;/span> try_eval_type &lt;span style="color:#f00">as&lt;/span> evaluate_forwardref
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> contextlib &lt;span style="color:#f00">import&lt;/span> AsyncExitStack, ExitStack, asynccontextmanager, contextmanager
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>P = ParamSpec(&lt;span style="color:#87ceeb">&amp;#34;P&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>T = t.TypeVar(&lt;span style="color:#87ceeb">&amp;#34;T&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>Cls = t.TypeVar(&lt;span style="color:#87ceeb">&amp;#34;Cls&amp;#34;&lt;/span>, bound=&lt;span style="color:#87ceeb">&amp;#34;CustomField&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>default_pydantic_config = {&lt;span style="color:#87ceeb">&amp;#34;arbitrary_types_allowed&amp;#34;&lt;/span>: &lt;span style="color:#f00">True&lt;/span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_config_base&lt;/span>(config_data: t.Optional[ConfigDict] = &lt;span style="color:#f00">None&lt;/span>) -&amp;gt; ConfigDict:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> config_data or ConfigDict(**default_pydantic_config)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_aliases&lt;/span>(model: t.Type[BaseModel]) -&amp;gt; t.Tuple[str, ...]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> tuple(f.alias or name &lt;span style="color:#f00">for&lt;/span> name, f in model.model_fields.items())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> Depends:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Mark a parameter as a dependency to be injected.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> use_cache: bool
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast: bool
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__init__&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency: t.Callable[..., t.Any],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> use_cache: bool = &lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast: bool = &lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ) -&amp;gt; &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.dependency = dependency
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.use_cache = use_cache
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.cast = cast
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__repr__&lt;/span>(self) -&amp;gt; str:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> attr = getattr(self.dependency, &lt;span style="color:#87ceeb">&amp;#34;__name__&amp;#34;&lt;/span>, type(self.dependency).&lt;span style="color:#eedd82">__name__&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache = &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&lt;/span> &lt;span style="color:#f00">if&lt;/span> self.use_cache &lt;span style="color:#f00">else&lt;/span> &lt;span style="color:#87ceeb">&amp;#34;, use_cache=False&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>self.&lt;span style="color:#eedd82">__class__&lt;/span>.&lt;span style="color:#eedd82">__name__&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">(&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>attr&lt;span style="color:#87ceeb">}{&lt;/span>cache&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">)&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> CustomField(ABC):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Base class for custom field extractors.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> param_name: t.Optional[str]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast: bool
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> required: bool
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#eedd82">__slots__&lt;/span> = (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;cast&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;param_name&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;required&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;field&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__init__&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast: bool = &lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> required: bool = &lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ) -&amp;gt; &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.cast = cast
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.param_name = &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.required = required
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.field = &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">set_param_name&lt;/span>(self: Cls, name: str) -&amp;gt; Cls:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.param_name = name
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> self
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">use&lt;/span>(self, /, **kwargs: t.Any) -&amp;gt; t.Dict[str, t.Any]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">assert&lt;/span> self.param_name, &lt;span style="color:#87ceeb">&amp;#34;You should specify `param_name` before using&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> kwargs
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">use_field&lt;/span>(self, kwargs: t.Dict[str, t.Any]) -&amp;gt; &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> NotImplementedError(&lt;span style="color:#87ceeb">&amp;#34;You should implement `use_field` method.&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Provider for dependency overrides&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> Provider:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Provider for dependency overrides.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides: t.Dict[t.Callable[..., t.Any], t.Callable[..., t.Any]]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__init__&lt;/span>(self) -&amp;gt; &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.dependency_overrides = {}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">clear&lt;/span>(self) -&amp;gt; &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.dependency_overrides = {}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">override&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> original: t.Callable[..., t.Any],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> override: t.Callable[..., t.Any],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ) -&amp;gt; &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.dependency_overrides[original] = override
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> @contextmanager
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">scope&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> original: t.Callable[..., t.Any],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> override: t.Callable[..., t.Any],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ) -&amp;gt; t.Iterator[&lt;span style="color:#f00">None&lt;/span>]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.dependency_overrides[original] = override
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">yield&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.dependency_overrides.pop(original, &lt;span style="color:#f00">None&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>dependency_provider = Provider()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">is_coroutine_callable&lt;/span>(call: t.Callable[..., t.Any]) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> inspect.isclass(call):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> asyncio.iscoroutinefunction(call):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">True&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dunder_call = getattr(call, &lt;span style="color:#87ceeb">&amp;#34;__call__&amp;#34;&lt;/span>, &lt;span style="color:#f00">None&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> asyncio.iscoroutinefunction(dunder_call)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">is_gen_callable&lt;/span>(call: t.Callable[..., t.Any]) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> inspect.isgeneratorfunction(call):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">True&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dunder_call = getattr(call, &lt;span style="color:#87ceeb">&amp;#34;__call__&amp;#34;&lt;/span>, &lt;span style="color:#f00">None&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> inspect.isgeneratorfunction(dunder_call)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">is_async_gen_callable&lt;/span>(call: t.Callable[..., t.Any]) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> inspect.isasyncgenfunction(call):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">True&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dunder_call = getattr(call, &lt;span style="color:#87ceeb">&amp;#34;__call__&amp;#34;&lt;/span>, &lt;span style="color:#f00">None&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> inspect.isasyncgenfunction(dunder_call)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">async&lt;/span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">run_async&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> func: t.Union[t.Callable[P, T], t.Callable[P, t.Awaitable[T]]],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *args: P.args,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **kwargs: P.kwargs,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; T:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> is_coroutine_callable(func):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">await&lt;/span> t.cast(t.Callable[P, t.Awaitable[T]], func)(*args, **kwargs)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">await&lt;/span> run_in_threadpool(t.cast(t.Callable[P, T], func), *args, **kwargs)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">async&lt;/span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">run_in_threadpool&lt;/span>(func: t.Callable[P, T], *args: P.args, **kwargs: P.kwargs) -&amp;gt; T:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> kwargs:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> func = functools.partial(func, **kwargs)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">await&lt;/span> anyio.to_thread.run_sync(func, *args)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_typed_annotation&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> annotation: t.Any,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> globalns: t.Dict[str, t.Any],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> locals: t.Dict[str, t.Any],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; t.Any:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(annotation, str):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> annotation = t.ForwardRef(annotation)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(annotation, t.ForwardRef):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> annotation = evaluate_forwardref(annotation, globalns, locals)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> get_origin(annotation) is Annotated and (args := get_args(annotation)):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> solved_args = [get_typed_annotation(x, globalns, locals) &lt;span style="color:#f00">for&lt;/span> x in args]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> annotation.__origin__, annotation.__metadata__ = solved_args[&lt;span style="color:#f60">0&lt;/span>], tuple(solved_args[&lt;span style="color:#f60">1&lt;/span>:])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> annotation
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">collect_outer_stack_locals&lt;/span>() -&amp;gt; t.Dict[str, t.Any]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Collect local variables from outer stack frames to resolve type annotations.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> This function walks up the call stack and collects all local variables
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> from frames outside of this module. This is necessary for resolving
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> forward references and string annotations that might reference variables
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> defined in the calling code.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> frame = inspect.currentframe()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> frames: t.List[t.Any] = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> current_filename = &lt;span style="color:#eedd82">__file__&lt;/span> &lt;span style="color:#f00">if&lt;/span> &lt;span style="color:#87ceeb">&amp;#34;__file__&amp;#34;&lt;/span> in globals() &lt;span style="color:#f00">else&lt;/span> &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">while&lt;/span> frame is not &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> frame_filename = frame.f_code.co_filename
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Skip frames from this module to avoid internal variables&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> current_filename is &lt;span style="color:#f00">None&lt;/span> or frame_filename != current_filename:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> frames.append(frame)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> frame = frame.f_back
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> locals = {}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> f in frames[::-&lt;span style="color:#f60">1&lt;/span>]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> locals.update(f.f_locals)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> locals
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_typed_signature&lt;/span>(call: t.Callable[..., t.Any]) -&amp;gt; t.Tuple[inspect.Signature, t.Any]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> signature = inspect.signature(call)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> locals = collect_outer_stack_locals()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> call = inspect.unwrap(call)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> globalns = getattr(call, &lt;span style="color:#87ceeb">&amp;#34;__globals__&amp;#34;&lt;/span>, {})
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> typed_params = [
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> inspect.Parameter(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> name=param.name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kind=param.kind,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> default=param.default,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> annotation=get_typed_annotation(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> param.annotation,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> globalns,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> locals,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> param in signature.parameters.values()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> inspect.Signature(typed_params), get_typed_annotation(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> signature.return_annotation,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> globalns,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> locals,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">async&lt;/span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">solve_generator_async&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *sub_args: t.Any, call: t.Callable[..., t.Any], stack: AsyncExitStack, **sub_values: t.Any
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; t.Any:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> is_gen_callable(call):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cm = contextmanager_in_threadpool(contextmanager(call)(**sub_values))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> is_async_gen_callable(call):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cm = asynccontextmanager(call)(*sub_args, **sub_values)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">await&lt;/span> stack.enter_async_context(cm)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">solve_generator_sync&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *sub_args: t.Any, call: t.Callable[..., t.Any], stack: ExitStack, **sub_values: t.Any
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; t.Any:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cm = contextmanager(call)(*sub_args, **sub_values)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> stack.enter_context(cm)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@asynccontextmanager
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">async&lt;/span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">contextmanager_in_threadpool&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cm: t.ContextManager[T],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; t.AsyncGenerator[T, &lt;span style="color:#f00">None&lt;/span>]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> exit_limiter = anyio.CapacityLimiter(&lt;span style="color:#f60">1&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">yield&lt;/span> &lt;span style="color:#f00">await&lt;/span> run_in_threadpool(cm.&lt;span style="color:#ff0">__enter__&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> Exception &lt;span style="color:#f00">as&lt;/span> e:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ok = bool(&lt;span style="color:#f00">await&lt;/span> anyio.to_thread.run_sync(cm.&lt;span style="color:#ff0">__exit__&lt;/span>, type(e), e, &lt;span style="color:#f00">None&lt;/span>, limiter=exit_limiter))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not ok:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> e
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">await&lt;/span> anyio.to_thread.run_sync(cm.&lt;span style="color:#ff0">__exit__&lt;/span>, &lt;span style="color:#f00">None&lt;/span>, &lt;span style="color:#f00">None&lt;/span>, &lt;span style="color:#f00">None&lt;/span>, limiter=exit_limiter)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">async&lt;/span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">async_map&lt;/span>(func: t.Callable[..., T], async_iterable: t.AsyncIterable[t.Any]) -&amp;gt; t.AsyncIterable[T]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">async&lt;/span> &lt;span style="color:#f00">for&lt;/span> i in async_iterable:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">yield&lt;/span> func(i)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> solve_wrapper(partial[T]):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> call: t.Callable[..., T]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__new__&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cls,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> func: t.Callable[..., T],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *args: t.Any,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **kwargs: t.Any,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ) -&amp;gt; &lt;span style="color:#87ceeb">&amp;#34;solve_wrapper[T]&amp;#34;&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">assert&lt;/span> len(args) &amp;gt; &lt;span style="color:#f60">0&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;Model should be passed as first argument&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> model = args[&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self = super().&lt;span style="color:#ff0">__new__&lt;/span>(cls, func, *args, **kwargs)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.call = model.call
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> self
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Core Models&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>PriorityPair = namedtuple(&lt;span style="color:#87ceeb">&amp;#34;PriorityPair&amp;#34;&lt;/span>, (&lt;span style="color:#87ceeb">&amp;#34;call&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;dependencies_number&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;dependencies_names&amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> ResponseModel(BaseModel, t.Generic[T]):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> response: T
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> CallModel(t.Generic[P, T]):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Model representing a callable with dependency injection.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> call: t.Union[t.Callable[P, T], t.Callable[P, t.Awaitable[T]]]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> is_async: bool
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> is_generator: bool
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> model: t.Optional[t.Type[BaseModel]]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> response_model: t.Optional[t.Type[ResponseModel[T]]]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> params: t.Dict[str, t.Tuple[t.Any, t.Any]]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> alias_arguments: t.Tuple[str, ...]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependencies: t.Dict[str, &lt;span style="color:#87ceeb">&amp;#34;CallModel[..., t.Any]&amp;#34;&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> extra_dependencies: t.Iterable[&lt;span style="color:#87ceeb">&amp;#34;CallModel[..., t.Any]&amp;#34;&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sorted_dependencies: t.Tuple[t.Tuple[&lt;span style="color:#87ceeb">&amp;#34;CallModel[..., t.Any]&amp;#34;&lt;/span>, int], ...]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> custom_fields: t.Dict[str, CustomField]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keyword_args: t.Tuple[str, ...]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> positional_args: t.Tuple[str, ...]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> var_positional_arg: t.Optional[str]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> var_keyword_arg: t.Optional[str]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> use_cache: bool
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast: bool
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#eedd82">__slots__&lt;/span> = (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;call&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;is_async&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;is_generator&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;model&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;response_model&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;params&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;alias_arguments&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;keyword_args&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;positional_args&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;var_positional_arg&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;var_keyword_arg&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;dependencies&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;extra_dependencies&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;sorted_dependencies&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;custom_fields&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;use_cache&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;cast&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> @property
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">call_name&lt;/span>(self) -&amp;gt; str:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> call = inspect.unwrap(self.call)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> getattr(call, &lt;span style="color:#87ceeb">&amp;#34;__name__&amp;#34;&lt;/span>, type(call).&lt;span style="color:#eedd82">__name__&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> @property
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">flat_params&lt;/span>(self) -&amp;gt; t.Dict[str, t.Tuple[t.Any, t.Any]]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> params = self.params
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> d in (*self.dependencies.values(), *self.extra_dependencies):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> params.update(d.flat_params)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> params
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> @property
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">flat_dependencies&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ) -&amp;gt; t.Dict[
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Callable[..., t.Any],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Tuple[&lt;span style="color:#87ceeb">&amp;#34;CallModel[..., t.Any]&amp;#34;&lt;/span>, t.Tuple[t.Callable[..., t.Any], ...]],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> flat: t.Dict[
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Callable[..., t.Any],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Tuple[CallModel[..., t.Any], t.Tuple[t.Callable[..., t.Any], ...]],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ] = {}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i in (*self.dependencies.values(), *self.extra_dependencies):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> flat.update(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> i.call: (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> i,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> tuple(j.call &lt;span style="color:#f00">for&lt;/span> j in i.dependencies.values()),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> flat.update(i.flat_dependencies)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> flat
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__init__&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> /,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> call: t.Union[t.Callable[P, T], t.Callable[P, t.Awaitable[T]]],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> model: t.Optional[t.Type[BaseModel]],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> params: t.Dict[str, t.Tuple[t.Any, t.Any]],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> response_model: t.Optional[t.Type[ResponseModel[T]]] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> use_cache: bool = &lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast: bool = &lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> is_async: bool = &lt;span style="color:#f00">False&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> is_generator: bool = &lt;span style="color:#f00">False&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependencies: t.Optional[t.Dict[str, &lt;span style="color:#87ceeb">&amp;#34;CallModel[..., t.Any]&amp;#34;&lt;/span>]] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> extra_dependencies: t.Optional[t.Iterable[&lt;span style="color:#87ceeb">&amp;#34;CallModel[..., t.Any]&amp;#34;&lt;/span>]] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keyword_args: t.Optional[t.List[str]] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> positional_args: t.Optional[t.List[str]] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> var_positional_arg: t.Optional[str] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> var_keyword_arg: t.Optional[str] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> custom_fields: t.Optional[t.Dict[str, CustomField]] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.call = call
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.model = model
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> model:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.alias_arguments = get_aliases(model)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.alias_arguments = ()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.keyword_args = tuple(keyword_args or ())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.positional_args = tuple(positional_args or ())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.var_positional_arg = var_positional_arg
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.var_keyword_arg = var_keyword_arg
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.response_model = response_model
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.use_cache = use_cache
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.cast = cast
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.is_async = is_async or is_coroutine_callable(call) or is_async_gen_callable(call)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.is_generator = is_generator or is_gen_callable(call) or is_async_gen_callable(call)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.dependencies = dependencies or {}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.extra_dependencies = extra_dependencies or ()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.custom_fields = custom_fields or {}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sorted_dep: t.List[CallModel[..., t.Any]] = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> flat = self.flat_dependencies
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> calls in flat.values():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> _sort_dep(sorted_dep, calls, flat)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.sorted_dependencies = tuple((i, len(i.sorted_dependencies)) &lt;span style="color:#f00">for&lt;/span> i in sorted_dep &lt;span style="color:#f00">if&lt;/span> i.use_cache)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> name in chain(self.dependencies.keys(), self.custom_fields.keys()):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> params.pop(name, &lt;span style="color:#f00">None&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.params = params
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_solve&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> /,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *args: t.Tuple[t.Any, ...],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_dependencies: t.Dict[
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Union[t.Callable[P, T], t.Callable[P, t.Awaitable[T]]],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> T,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides: t.Optional[
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Dict[
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Union[t.Callable[P, T], t.Callable[P, t.Awaitable[T]]],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Union[t.Callable[P, T], t.Callable[P, t.Awaitable[T]]],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **kwargs: t.Dict[str, t.Any],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ) -&amp;gt; t.Generator[
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Tuple[t.Sequence[t.Any], t.Dict[str, t.Any], t.Callable[..., t.Any]],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Any,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> T,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> dependency_overrides:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> call = dependency_overrides.get(self.call, self.call)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">assert&lt;/span> self.is_async or not is_coroutine_callable(call), (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;You cannot use async dependency `&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>self.call_name&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">` at sync main&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> call = self.call
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> self.use_cache and call in cache_dependencies:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> cache_dependencies[call]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kw: t.Dict[str, t.Any] = {}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> arg in self.keyword_args:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> (v := kwargs.pop(arg, inspect.Parameter.empty)) is not inspect.Parameter.empty:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kw[arg] = v
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> self.var_keyword_arg is not &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kw[self.var_keyword_arg] = kwargs
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kw.update(kwargs)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> arg in self.positional_args:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> args:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kw[arg], args = args[&lt;span style="color:#f60">0&lt;/span>], args[&lt;span style="color:#f60">1&lt;/span>:]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">break&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keyword_args: t.Iterable[str]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> self.var_positional_arg is not &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kw[self.var_positional_arg] = args
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keyword_args = self.keyword_args
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keyword_args = self.keyword_args + self.positional_args
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> arg in keyword_args:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not self.cast and arg in self.params:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kw[arg] = self.params[arg][&lt;span style="color:#f60">1&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not args:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">break&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> arg not in self.dependencies:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kw[arg], args = args[&lt;span style="color:#f60">0&lt;/span>], args[&lt;span style="color:#f60">1&lt;/span>:]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> solved_kw: t.Dict[str, t.Any]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> solved_kw = &lt;span style="color:#f00">yield&lt;/span> args, kw, call
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> args_: t.Sequence[t.Any]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> self.cast:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">assert&lt;/span> self.model, &lt;span style="color:#87ceeb">&amp;#34;Cast should be used only with model&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> casted_model = self.model(**solved_kw)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kwargs_ = {arg: getattr(casted_model, arg, solved_kw.get(arg)) &lt;span style="color:#f00">for&lt;/span> arg in keyword_args}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> self.var_keyword_arg:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kwargs_.update(getattr(casted_model, self.var_keyword_arg, {}))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> self.var_positional_arg is not &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> args_ = [getattr(casted_model, arg, solved_kw.get(arg)) &lt;span style="color:#f00">for&lt;/span> arg in self.positional_args]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> args_.extend(getattr(casted_model, self.var_positional_arg, ()))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> args_ = ()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kwargs_ = {arg: solved_kw.get(arg) &lt;span style="color:#f00">for&lt;/span> arg in keyword_args}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> self.var_positional_arg is not &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> args_ = tuple(map(solved_kw.get, self.positional_args))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> args_ = ()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> response: T
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> response = &lt;span style="color:#f00">yield&lt;/span> args_, kwargs_, call
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> self.cast and not self.is_generator:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> response = self._cast_response(response)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> self.use_cache:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_dependencies[call] = response
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> response
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_cast_response&lt;/span>(self, /, value: t.Any) -&amp;gt; t.Any:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> self.response_model is not &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> self.response_model(response=value).response
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">solve&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> /,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *args: t.Tuple[t.Any, ...],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stack: ExitStack,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_dependencies: t.Dict[
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Union[t.Callable[P, T], t.Callable[P, t.Awaitable[T]]],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> T,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides: t.Optional[
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Dict[
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Union[t.Callable[P, T], t.Callable[P, t.Awaitable[T]]],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Union[t.Callable[P, T], t.Callable[P, t.Awaitable[T]]],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> nested: bool = &lt;span style="color:#f00">False&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **kwargs: t.Dict[str, t.Any],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ) -&amp;gt; T:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast_gen = self._solve(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *args,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_dependencies=cache_dependencies,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides=dependency_overrides,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **kwargs,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> args, kwargs, _ = next(cast_gen)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> StopIteration &lt;span style="color:#f00">as&lt;/span> e:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cached_value: T = e.value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> cached_value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Heat cache and solve extra dependencies&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> dep, _ in self.sorted_dependencies:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dep.solve(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *args,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stack=stack,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_dependencies=cache_dependencies,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides=dependency_overrides,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> nested=&lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **kwargs,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Always get from cache&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> dep in self.extra_dependencies:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dep.solve(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *args,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stack=stack,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_dependencies=cache_dependencies,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides=dependency_overrides,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> nested=&lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **kwargs,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> dep_arg, dep in self.dependencies.items():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kwargs[dep_arg] = dep.solve(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stack=stack,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_dependencies=cache_dependencies,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides=dependency_overrides,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> nested=&lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **kwargs,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> custom in self.custom_fields.values():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> custom.field:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> custom.use_field(kwargs)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kwargs = custom.use(**kwargs)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> final_args, final_kwargs, call = cast_gen.send(kwargs)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> self.is_generator and nested:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> response = solve_generator_sync(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *final_args,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> call=call,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stack=stack,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **final_kwargs,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> response = call(*final_args, **final_kwargs)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast_gen.send(response)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> StopIteration &lt;span style="color:#f00">as&lt;/span> e:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> value: T = e.value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not self.cast or nested or not self.is_generator:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> map(self._cast_response, value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> AssertionError(&lt;span style="color:#87ceeb">&amp;#34;unreachable&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">async&lt;/span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">asolve&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> /,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *args: t.Tuple[t.Any, ...],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stack: AsyncExitStack,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_dependencies: t.Dict[
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Union[t.Callable[P, T], t.Callable[P, t.Awaitable[T]]],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> T,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides: t.Optional[
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Dict[
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Union[t.Callable[P, T], t.Callable[P, t.Awaitable[T]]],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Union[t.Callable[P, T], t.Callable[P, t.Awaitable[T]]],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> nested: bool = &lt;span style="color:#f00">False&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **kwargs: t.Dict[str, t.Any],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ) -&amp;gt; T:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast_gen = self._solve(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *args,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_dependencies=cache_dependencies,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides=dependency_overrides,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **kwargs,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> args, kwargs, _ = next(cast_gen)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> StopIteration &lt;span style="color:#f00">as&lt;/span> e:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cached_value: T = e.value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> cached_value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Heat cache and solve extra dependencies&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dep_to_solve: t.List[t.Callable[..., t.Awaitable[t.Any]]] = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">async&lt;/span> &lt;span style="color:#f00">with&lt;/span> anyio.create_task_group() &lt;span style="color:#f00">as&lt;/span> tg:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> dep, subdep in self.sorted_dependencies:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> solve = partial(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dep.asolve,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *args,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stack=stack,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_dependencies=cache_dependencies,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides=dependency_overrides,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> nested=&lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **kwargs,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not subdep:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> tg.start_soon(solve)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dep_to_solve.append(solve)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> Exception &lt;span style="color:#f00">as&lt;/span> e:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> e
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i in dep_to_solve:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">await&lt;/span> i()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Always get from cache&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> dep in self.extra_dependencies:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">await&lt;/span> dep.asolve(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *args,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stack=stack,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_dependencies=cache_dependencies,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides=dependency_overrides,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> nested=&lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **kwargs,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> dep_arg, dep in self.dependencies.items():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kwargs[dep_arg] = &lt;span style="color:#f00">await&lt;/span> dep.asolve(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stack=stack,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_dependencies=cache_dependencies,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides=dependency_overrides,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> nested=&lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **kwargs,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> custom_to_solve: t.List[CustomField] = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">async&lt;/span> &lt;span style="color:#f00">with&lt;/span> anyio.create_task_group() &lt;span style="color:#f00">as&lt;/span> tg:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> custom in self.custom_fields.values():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> custom.field:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> tg.start_soon(run_async, custom.use_field, kwargs)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> custom_to_solve.append(custom)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> Exception &lt;span style="color:#f00">as&lt;/span> e:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> e
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> j in custom_to_solve:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kwargs = &lt;span style="color:#f00">await&lt;/span> run_async(j.use, **kwargs)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> final_args, final_kwargs, call = cast_gen.send(kwargs)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> self.is_generator and nested:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> response = &lt;span style="color:#f00">await&lt;/span> solve_generator_async(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *final_args,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> call=call,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stack=stack,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **final_kwargs,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> response = &lt;span style="color:#f00">await&lt;/span> run_async(call, *final_args, **final_kwargs)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast_gen.send(response)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> StopIteration &lt;span style="color:#f00">as&lt;/span> e:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> value: T = e.value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not self.cast or nested or not self.is_generator:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> async_map(self._cast_response, value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> AssertionError(&lt;span style="color:#87ceeb">&amp;#34;unreachable&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_sort_dep&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> collector: t.List[&lt;span style="color:#87ceeb">&amp;#34;CallModel[..., t.Any]&amp;#34;&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> items: t.Tuple[
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;CallModel[..., t.Any]&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Tuple[t.Callable[..., t.Any], ...],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> flat: t.Dict[
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Callable[..., t.Any],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Tuple[
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;CallModel[..., t.Any]&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Tuple[t.Callable[..., t.Any], ...],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> model, calls = items
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> model in collector:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not calls:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> position = -&lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i in calls:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sub_model, _ = flat[i]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> sub_model not in collector:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> _sort_dep(collector, flat[i], flat)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> position = max(collector.index(flat[i][&lt;span style="color:#f60">0&lt;/span>]) &lt;span style="color:#f00">for&lt;/span> i in calls)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> collector.insert(position + &lt;span style="color:#f60">1&lt;/span>, model)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>CUSTOM_ANNOTATIONS = (Depends, CustomField)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">build_call_model&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> call: t.Union[t.Callable[P, T], t.Callable[P, t.Awaitable[T]]],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast: bool = &lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> use_cache: bool = &lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> is_sync: t.Optional[bool] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> extra_dependencies: t.Sequence[Depends] = (),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pydantic_config: t.Optional[ConfigDict] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; CallModel[P, T]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Build a CallModel from a callable.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> name = getattr(call, &lt;span style="color:#87ceeb">&amp;#34;__name__&amp;#34;&lt;/span>, type(call).&lt;span style="color:#eedd82">__name__&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> is_call_async = is_coroutine_callable(call) or is_async_gen_callable(call)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> is_sync is &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> is_sync = not is_call_async
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">assert&lt;/span> not (is_sync and is_call_async), &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;You cannot use async dependency `&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>name&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">` at sync main&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> typed_params, return_annotation = get_typed_signature(call)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> (is_call_generator := is_gen_callable(call) or is_async_gen_callable(call)) and (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> return_args := get_args(return_annotation)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> return_annotation = return_args[&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> class_fields: t.Dict[str, t.Tuple[t.Any, t.Any]] = {}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependencies: t.Dict[str, CallModel[..., t.Any]] = {}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> custom_fields: t.Dict[str, CustomField] = {}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> positional_args: t.List[str] = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keyword_args: t.List[str] = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> var_positional_arg: t.Optional[str] = &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> var_keyword_arg: t.Optional[str] = &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> param_name, param in typed_params.parameters.items():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dep: t.Optional[Depends] = &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> custom: t.Optional[CustomField] = &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> param.annotation is inspect.Parameter.empty:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> annotation = t.Any
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> get_origin(param.annotation) is Annotated:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> annotated_args = get_args(param.annotation)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> type_annotation = annotated_args[&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> custom_annotations = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> regular_annotations = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> arg in annotated_args[&lt;span style="color:#f60">1&lt;/span>:]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(arg, CUSTOM_ANNOTATIONS):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> custom_annotations.append(arg)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> regular_annotations.append(arg)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">assert&lt;/span> len(custom_annotations) &amp;lt;= &lt;span style="color:#f60">1&lt;/span>, (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Cannot specify multiple `Annotated` Custom arguments for `&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>param_name&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">`!&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> next_custom = next(iter(custom_annotations), &lt;span style="color:#f00">None&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> next_custom is not &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(next_custom, Depends):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dep = next_custom
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> isinstance(next_custom, CustomField):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> custom = deepcopy(next_custom)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> AssertionError(&lt;span style="color:#87ceeb">&amp;#34;unreachable&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> regular_annotations:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> annotation = param.annotation
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> annotation = type_annotation
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> annotation = param.annotation
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> annotation = param.annotation
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> default: t.Any
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> param.kind == inspect.Parameter.VAR_POSITIONAL:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> default = ()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> var_positional_arg = param_name
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> param.kind == inspect.Parameter.VAR_KEYWORD:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> default = {}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> var_keyword_arg = param_name
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> param.default is inspect.Parameter.empty:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> default = Ellipsis
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> default = param.default
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(default, Depends):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> dep:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> AssertionError(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;You can not use both `Depends` with `Annotated` and a default&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dep, default = default, Ellipsis
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> isinstance(default, CustomField):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> custom:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> AssertionError(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;You can not use both `CustomField` with `Annotated` and a default&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> custom, default = default, Ellipsis
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> class_fields[param_name] = (annotation, default)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> dep:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not cast:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dep.cast = &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(dep.dependency, solve_wrapper):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dep.dependency = dep.dependency.call
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependencies[param_name] = build_call_model(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dep.dependency,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast=dep.cast,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> use_cache=dep.use_cache,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> is_sync=is_sync,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pydantic_config=pydantic_config,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> dep.cast is &lt;span style="color:#f00">True&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> class_fields[param_name] = (annotation, Ellipsis)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keyword_args.append(param_name)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> custom:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">assert&lt;/span> not (is_sync and is_coroutine_callable(custom.use)), (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;You cannot use async custom field `&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>type(custom).&lt;span style="color:#eedd82">__name__&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">` at sync `&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>name&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">`&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> custom.set_param_name(param_name)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> custom_fields[param_name] = custom
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> custom.cast is &lt;span style="color:#f00">False&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> annotation = t.Any
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> custom.required:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> class_fields[param_name] = (annotation, default)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> class_fields[param_name] = class_fields.get(param_name, (t.Optional[annotation], &lt;span style="color:#f00">None&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keyword_args.append(param_name)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> param.kind is param.KEYWORD_ONLY:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keyword_args.append(param_name)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> param.kind not in (inspect.Parameter.VAR_POSITIONAL, inspect.Parameter.VAR_KEYWORD):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> positional_args.append(param_name)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> func_model = create_model(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> __config__=get_config_base(pydantic_config),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **class_fields,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> response_model: t.Optional[t.Type[ResponseModel[T]]] = &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> cast and return_annotation and return_annotation is not inspect.Parameter.empty:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> response_model = create_model(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;ResponseModel&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> __config__=get_config_base(pydantic_config),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> response=(return_annotation, Ellipsis),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> CallModel(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> call=call,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> model=func_model,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> response_model=response_model,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> params=class_fields,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast=cast,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> use_cache=use_cache,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> is_async=is_call_async,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> is_generator=is_call_generator,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependencies=dependencies,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> custom_fields=custom_fields,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> positional_args=positional_args,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keyword_args=keyword_args,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> var_positional_arg=var_positional_arg,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> var_keyword_arg=var_keyword_arg,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> extra_dependencies=[
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> build_call_model(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> d.dependency,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast=d.cast,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> use_cache=d.use_cache,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> is_sync=is_sync,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pydantic_config=pydantic_config,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> d in extra_dependencies
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> _InjectWrapper(t.Protocol[P, T]):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__call__&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> func: t.Callable[P, T],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> model: t.Optional[CallModel[P, T]] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ) -&amp;gt; t.Callable[P, T]: ...
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@t.overload
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">inject&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> func: &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast: bool = &lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> extra_dependencies: t.Sequence[Depends] = (),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pydantic_config: t.Optional[ConfigDict] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides_provider: t.Optional[t.Any] = dependency_provider,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> wrap_model: t.Callable[[CallModel[P, T]], CallModel[P, T]] = &lt;span style="color:#f00">lambda&lt;/span> x: x,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; _InjectWrapper[P, T]: ...
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@t.overload
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">inject&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> func: t.Callable[P, T],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast: bool = &lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> extra_dependencies: t.Sequence[Depends] = (),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pydantic_config: t.Optional[ConfigDict] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides_provider: t.Optional[t.Any] = dependency_provider,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> wrap_model: t.Callable[[CallModel[P, T]], CallModel[P, T]] = &lt;span style="color:#f00">lambda&lt;/span> x: x,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; t.Callable[P, T]: ...
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">inject&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> func: t.Optional[t.Callable[P, T]] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast: bool = &lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> extra_dependencies: t.Sequence[Depends] = (),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pydantic_config: t.Optional[ConfigDict] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides_provider: t.Optional[t.Any] = dependency_provider,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> wrap_model: t.Callable[[CallModel[P, T]], CallModel[P, T]] = &lt;span style="color:#f00">lambda&lt;/span> x: x,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; t.Union[t.Callable[P, T], _InjectWrapper[P, T]]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Decorator to inject dependencies into a function.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> decorator = _wrap_inject(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides_provider=dependency_overrides_provider,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> wrap_model=wrap_model,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> extra_dependencies=extra_dependencies,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast=cast,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pydantic_config=pydantic_config,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> func is &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> decorator
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> decorator(func)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_wrap_inject&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides_provider: t.Optional[t.Any],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> wrap_model: t.Callable[[CallModel[P, T]], CallModel[P, T]],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> extra_dependencies: t.Sequence[Depends],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast: bool,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pydantic_config: t.Optional[ConfigDict],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; _InjectWrapper[P, T]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides_provider
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> and getattr(dependency_overrides_provider, &lt;span style="color:#87ceeb">&amp;#34;dependency_overrides&amp;#34;&lt;/span>, &lt;span style="color:#f00">None&lt;/span>) is not &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> overrides = dependency_overrides_provider.dependency_overrides
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> overrides = &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">func_wrapper&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> func: t.Callable[P, T],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> model: t.Optional[CallModel[P, T]] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ) -&amp;gt; t.Callable[P, T]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> model is &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> real_model = wrap_model(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> build_call_model(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> call=func,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> extra_dependencies=extra_dependencies,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cast=cast,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pydantic_config=pydantic_config,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> real_model = model
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> real_model.is_async:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> injected_wrapper: t.Callable[P, T]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> real_model.is_generator:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> injected_wrapper = solve_wrapper(solve_async_gen, real_model, overrides)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> @wraps(func)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">async&lt;/span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">injected_wrapper&lt;/span>(*args: P.args, **kwargs: P.kwargs) -&amp;gt; T:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">async&lt;/span> &lt;span style="color:#f00">with&lt;/span> AsyncExitStack() &lt;span style="color:#f00">as&lt;/span> stack:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> r = &lt;span style="color:#f00">await&lt;/span> real_model.asolve(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *args,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stack=stack,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides=overrides,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_dependencies={},
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> nested=&lt;span style="color:#f00">False&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **kwargs,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> r
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> AssertionError(&lt;span style="color:#87ceeb">&amp;#34;unreachable&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> real_model.is_generator:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> injected_wrapper = solve_wrapper(solve_gen, real_model, overrides)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> @wraps(func)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">injected_wrapper&lt;/span>(*args: P.args, **kwargs: P.kwargs) -&amp;gt; T:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> ExitStack() &lt;span style="color:#f00">as&lt;/span> stack:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> r = real_model.solve(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *args,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stack=stack,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides=overrides,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_dependencies={},
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> nested=&lt;span style="color:#f00">False&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **kwargs,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> r
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> AssertionError(&lt;span style="color:#87ceeb">&amp;#34;unreachable&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> injected_wrapper
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> func_wrapper
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> solve_async_gen:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> _iter: t.Optional[t.AsyncIterator[t.Any]] = &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__init__&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> model: &lt;span style="color:#87ceeb">&amp;#34;CallModel[..., t.Any]&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> overrides: t.Optional[t.Any],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *args: t.Any,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **kwargs: t.Any,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.call = model
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.args = args
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.kwargs = kwargs
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.overrides = overrides
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__aiter__&lt;/span>(self) -&amp;gt; &lt;span style="color:#87ceeb">&amp;#34;solve_async_gen&amp;#34;&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.stack = AsyncExitStack()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> self
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">async&lt;/span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__anext__&lt;/span>(self) -&amp;gt; t.Any:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> self._iter is &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stack = self.stack = AsyncExitStack()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">await&lt;/span> self.stack.&lt;span style="color:#ff0">__aenter__&lt;/span>()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._iter = t.cast(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.AsyncIterator[t.Any],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">await&lt;/span> self.call.asolve(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *self.args,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stack=stack,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides=self.overrides,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_dependencies={},
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> nested=&lt;span style="color:#f00">False&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **self.kwargs,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ).&lt;span style="color:#ff0">__aiter__&lt;/span>(),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> r = &lt;span style="color:#f00">await&lt;/span> self._iter.&lt;span style="color:#ff0">__anext__&lt;/span>()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> StopAsyncIteration &lt;span style="color:#f00">as&lt;/span> e:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">await&lt;/span> self.stack.&lt;span style="color:#ff0">__aexit__&lt;/span>(&lt;span style="color:#f00">None&lt;/span>, &lt;span style="color:#f00">None&lt;/span>, &lt;span style="color:#f00">None&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> e
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> r
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> solve_gen:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> _iter: t.Optional[t.Iterator[t.Any]] = &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__init__&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> model: &lt;span style="color:#87ceeb">&amp;#34;CallModel[..., t.Any]&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> overrides: t.Optional[t.Any],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *args: t.Any,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **kwargs: t.Any,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.call = model
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.args = args
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.kwargs = kwargs
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.overrides = overrides
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__iter__&lt;/span>(self) -&amp;gt; &lt;span style="color:#87ceeb">&amp;#34;solve_gen&amp;#34;&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.stack = ExitStack()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> self
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__next__&lt;/span>(self) -&amp;gt; t.Any:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> self._iter is &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stack = self.stack = ExitStack()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.stack.&lt;span style="color:#ff0">__enter__&lt;/span>()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self._iter = t.cast(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t.Iterator[t.Any],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> iter(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.call.solve(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> *self.args,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stack=stack,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dependency_overrides=self.overrides,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_dependencies={},
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> nested=&lt;span style="color:#f00">False&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> **self.kwargs,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> r = next(self._iter)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> StopIteration &lt;span style="color:#f00">as&lt;/span> e:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.stack.&lt;span style="color:#ff0">__exit__&lt;/span>(&lt;span style="color:#f00">None&lt;/span>, &lt;span style="color:#f00">None&lt;/span>, &lt;span style="color:#f00">None&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> e
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> r
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Autostore: File Storage Made Simple</title><link>https://asifr.com/autostore/</link><pubDate>Sat, 14 Jun 2025 17:08:21 -0400</pubDate><guid>https://asifr.com/autostore/</guid><description>
&lt;p>&lt;a href="https://pypi.org/project/autostore/">AutoStore&lt;/a> provides a dictionary-like interface for reading and writing files with caching and different storage backends.&lt;/p>
&lt;p>AutoStore eliminates the cognitive overhead of managing different file formats, letting you focus on your data and analysis rather than the mechanics of file I/O. It automatically handles file format detection, type inference, upload/download operations, and provides a clean, intuitive API for data persistence across local and cloud storage.&lt;/p>
&lt;p>Table of contents:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="#why-use-autostore">Why Use AutoStore?&lt;/a>&lt;/li>
&lt;li>&lt;a href="#getting-started">Getting Started&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#basic-usage">Basic Usage&lt;/a>&lt;/li>
&lt;li>&lt;a href="#cloud-storage-s3">Cloud Storage (S3)&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#supported-data-types">Supported Data Types&lt;/a>&lt;/li>
&lt;li>&lt;a href="#configuration-options">Configuration Options&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#s3storageconfig">S3StorageConfig&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#advanced-features">Advanced Features&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#caching-system">Caching System&lt;/a>&lt;/li>
&lt;li>&lt;a href="#custom-data-handlers">Custom Data Handlers&lt;/a>&lt;/li>
&lt;li>&lt;a href="#file-operations">File Operations&lt;/a>&lt;/li>
&lt;li>&lt;a href="#context-management">Context Management&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#multiple-storage-backends">Multiple Storage Backends&lt;/a>&lt;/li>
&lt;li>&lt;a href="#performance-considerations">Performance Considerations&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#large-file-handling">Large File Handling&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#when-to-use-autostore">When to Use AutoStore&lt;/a>&lt;/li>
&lt;li>&lt;a href="#comparison-with-alternatives">Comparison with Alternatives&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="why-use-autostore">Why Use AutoStore?&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Simplicity&lt;/strong>: Store and retrieve data with dictionary syntax. No need to remember APIs for different file formats.&lt;/li>
&lt;li>&lt;strong>Caching&lt;/strong>: Caching system with configurable expiration reduces redundant downloads, especially for cloud storage.&lt;/li>
&lt;li>&lt;strong>Multiple Storage Backends&lt;/strong>: Seamlessly work with local files, S3, and other cloud storage services.&lt;/li>
&lt;li>&lt;strong>Type Detection&lt;/strong>: Automatically infers the best file format based on the data type.&lt;/li>
&lt;li>&lt;strong>Multiple Data Types&lt;/strong>: Built-in support for Polars DataFrames, JSON, CSV, images, PyTorch models, NumPy arrays, and more.&lt;/li>
&lt;li>&lt;strong>Extensible Architecture&lt;/strong>: Pluggable handler system for new data types and storage backends.&lt;/li>
&lt;li>&lt;strong>Performance Optimized&lt;/strong>: Upload/download operations with efficient handling of large files.&lt;/li>
&lt;li>&lt;strong>Type-Safe Configuration&lt;/strong>: Dataclass-based configuration with IDE support and validation.&lt;/li>
&lt;/ul>
&lt;h2 id="getting-started">Getting Started&lt;/h2>
&lt;p>AutoStore requires Python 3.10+ and can be installed via pip.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>pip install autostore
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="basic-usage">Basic Usage&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> autostore &lt;span style="color:#f00">import&lt;/span> AutoStore
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>store = AutoStore(&lt;span style="color:#87ceeb">&amp;#34;./data&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Write data - automatically saves with appropriate extensions&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>store[&lt;span style="color:#87ceeb">&amp;#34;my_dataframe&amp;#34;&lt;/span>] = df &lt;span style="color:#0f0"># ./data/my_dataframe.parquet&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>store[&lt;span style="color:#87ceeb">&amp;#34;config&amp;#34;&lt;/span>] = {&lt;span style="color:#87ceeb">&amp;#34;key&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;value&amp;#34;&lt;/span>} &lt;span style="color:#0f0"># ./data/config.json&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>store[&lt;span style="color:#87ceeb">&amp;#34;logs&amp;#34;&lt;/span>] = [{&lt;span style="color:#87ceeb">&amp;#34;event&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;start&amp;#34;&lt;/span>}] &lt;span style="color:#0f0"># ./data/logs.jsonl&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Read data&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>df = store[&lt;span style="color:#87ceeb">&amp;#34;my_dataframe&amp;#34;&lt;/span>] &lt;span style="color:#0f0"># Returns a Polars DataFrame&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>config = store[&lt;span style="color:#87ceeb">&amp;#34;config&amp;#34;&lt;/span>] &lt;span style="color:#0f0"># Returns a dict&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>logs = store[&lt;span style="color:#87ceeb">&amp;#34;logs&amp;#34;&lt;/span>] &lt;span style="color:#0f0"># Returns a list of dicts&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="cloud-storage-s3">Cloud Storage (S3)&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> autostore &lt;span style="color:#f00">import&lt;/span> AutoStore
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> autostore.s3 &lt;span style="color:#f00">import&lt;/span> S3Backend, S3StorageConfig
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Register S3 backend&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>AutoStore.register_backend(&lt;span style="color:#87ceeb">&amp;#34;s3&amp;#34;&lt;/span>, S3Backend)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Configure S3 with caching&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>s3_config = S3StorageConfig(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> region_name=&lt;span style="color:#87ceeb">&amp;#34;us-east-1&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_enabled=&lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_expiry_hours=&lt;span style="color:#f60">12&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> multipart_threshold=&lt;span style="color:#f60">64&lt;/span> * &lt;span style="color:#f60">1024&lt;/span> * &lt;span style="color:#f60">1024&lt;/span> &lt;span style="color:#0f0"># 64MB&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Use S3 storage&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>store = AutoStore(&lt;span style="color:#87ceeb">&amp;#34;s3://my-bucket/data/&amp;#34;&lt;/span>, config=s3_config)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>store[&lt;span style="color:#87ceeb">&amp;#34;experiment/results&amp;#34;&lt;/span>] = {&lt;span style="color:#87ceeb">&amp;#34;accuracy&amp;#34;&lt;/span>: &lt;span style="color:#f60">0.95&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;epochs&amp;#34;&lt;/span>: &lt;span style="color:#f60">100&lt;/span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>results = store[&lt;span style="color:#87ceeb">&amp;#34;experiment/results&amp;#34;&lt;/span>] &lt;span style="color:#0f0"># Uses cache on subsequent loads&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="supported-data-types">Supported Data Types&lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Data Type&lt;/th>
&lt;th>File Extension&lt;/th>
&lt;th>Description&lt;/th>
&lt;th>Library Required&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Polars DataFrame/LazyFrame&lt;/td>
&lt;td>&lt;code>.parquet&lt;/code>, &lt;code>.csv&lt;/code>&lt;/td>
&lt;td>High-performance DataFrames&lt;/td>
&lt;td>&lt;a href="https://pypi.org/project/polars/">polars&lt;/a>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Python dict/list&lt;/td>
&lt;td>&lt;code>.json&lt;/code>&lt;/td>
&lt;td>Standard JSON serialization&lt;/td>
&lt;td>built-in&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>List of dicts&lt;/td>
&lt;td>&lt;code>.jsonl&lt;/code>&lt;/td>
&lt;td>JSON Lines format&lt;/td>
&lt;td>built-in&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Pydantic models&lt;/td>
&lt;td>&lt;code>.pydantic.json&lt;/code>&lt;/td>
&lt;td>Structured data models&lt;/td>
&lt;td>&lt;a href="https://pypi.org/project/pydantic/">pydantic&lt;/a>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Python dataclasses&lt;/td>
&lt;td>&lt;code>.dataclass.json&lt;/code>&lt;/td>
&lt;td>Dataclass serialization&lt;/td>
&lt;td>built-in&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>String data&lt;/td>
&lt;td>&lt;code>.txt&lt;/code>, &lt;code>.html&lt;/code>, &lt;code>.md&lt;/code>&lt;/td>
&lt;td>Plain text files&lt;/td>
&lt;td>built-in&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>NumPy arrays&lt;/td>
&lt;td>&lt;code>.npy&lt;/code>, &lt;code>.npz&lt;/code>&lt;/td>
&lt;td>Numerical data&lt;/td>
&lt;td>&lt;a href="https://pypi.org/project/numpy/">numpy&lt;/a>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>SciPy sparse matrices&lt;/td>
&lt;td>&lt;code>.sparse&lt;/code>&lt;/td>
&lt;td>Sparse matrix data&lt;/td>
&lt;td>&lt;a href="https://pypi.org/project/scipy/">scipy&lt;/a>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>PyTorch tensors/models&lt;/td>
&lt;td>&lt;code>.pt&lt;/code>, &lt;code>.pth&lt;/code>&lt;/td>
&lt;td>Deep learning models&lt;/td>
&lt;td>&lt;a href="https://pypi.org/project/torch/">torch&lt;/a>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>PIL/Pillow images&lt;/td>
&lt;td>&lt;code>.png&lt;/code>, &lt;code>.jpg&lt;/code>, etc.&lt;/td>
&lt;td>Image data&lt;/td>
&lt;td>&lt;a href="https://pypi.org/project/pillow/">Pillow&lt;/a>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>YAML data&lt;/td>
&lt;td>&lt;code>.yaml&lt;/code>, &lt;code>.yml&lt;/code>&lt;/td>
&lt;td>Human-readable config files&lt;/td>
&lt;td>&lt;a href="https://pypi.org/project/PyYAML/">PyYAML&lt;/a>&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Any Python object&lt;/td>
&lt;td>&lt;code>.pkl&lt;/code>&lt;/td>
&lt;td>Pickle fallback&lt;/td>
&lt;td>built-in&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="configuration-options">Configuration Options&lt;/h2>
&lt;h3 id="s3storageconfig">S3StorageConfig&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> s3 &lt;span style="color:#f00">import&lt;/span> S3StorageConfig
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>config = S3StorageConfig(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> aws_access_key_id=&lt;span style="color:#87ceeb">&amp;#34;your-key&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> aws_secret_access_key=&lt;span style="color:#87ceeb">&amp;#34;your-secret&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> region_name=&lt;span style="color:#87ceeb">&amp;#34;us-east-1&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_enabled=&lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cache_expiry_hours=&lt;span style="color:#f60">12&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> multipart_threshold=&lt;span style="color:#f60">64&lt;/span> * &lt;span style="color:#f60">1024&lt;/span> * &lt;span style="color:#f60">1024&lt;/span>, &lt;span style="color:#0f0"># Files larger than this use multipart upload&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> multipart_chunksize=&lt;span style="color:#f60">16&lt;/span> * &lt;span style="color:#f60">1024&lt;/span> * &lt;span style="color:#f60">1024&lt;/span>, &lt;span style="color:#0f0"># Chunk size for multipart uploads&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> max_concurrency=&lt;span style="color:#f60">10&lt;/span> &lt;span style="color:#0f0"># Maximum concurrent uploads/downloads&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="advanced-features">Advanced Features&lt;/h2>
&lt;h3 id="caching-system">Caching System&lt;/h3>
&lt;p>AutoStore includes an intelligent caching system that:&lt;/p>
&lt;ul>
&lt;li>Stores frequently accessed files locally&lt;/li>
&lt;li>Uses ETags for cache validation&lt;/li>
&lt;li>Automatically expires old cache entries&lt;/li>
&lt;li>Significantly improves performance for cloud storage&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Cache management&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>store.cleanup_cache() &lt;span style="color:#0f0"># Remove expired cache entries&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Check cache status&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>metadata = store.get_metadata(&lt;span style="color:#87ceeb">&amp;#34;large_file&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>print(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;File size: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>metadata.size&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> bytes&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>print(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;ETag: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>metadata.etag&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="custom-data-handlers">Custom Data Handlers&lt;/h3>
&lt;p>Add support for new data types by creating custom handlers:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> pathlib &lt;span style="color:#f00">import&lt;/span> Path
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> autostore.autostore &lt;span style="color:#f00">import&lt;/span> DataHandler
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> CustomLogHandler(DataHandler):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">can_handle_extension&lt;/span>(self, extension: str) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> extension.lower() == &lt;span style="color:#87ceeb">&amp;#34;.log&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">can_handle_data&lt;/span>(self, data) -&amp;gt; bool:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> isinstance(data, list) and all(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> isinstance(item, dict) and &lt;span style="color:#87ceeb">&amp;#34;timestamp&amp;#34;&lt;/span> in item
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> item in data
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">read_from_file&lt;/span>(self, file_path: Path, file_extension: str):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> logs = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> open(file_path, &lt;span style="color:#87ceeb">&amp;#39;r&amp;#39;&lt;/span>) &lt;span style="color:#f00">as&lt;/span> f:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> line in f:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> line.strip():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> logs.append(json.loads(line))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> logs
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">write_to_file&lt;/span>(self, data, file_path: Path, file_extension: str):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> file_path.parent.mkdir(parents=&lt;span style="color:#f00">True&lt;/span>, exist_ok=&lt;span style="color:#f00">True&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">with&lt;/span> open(file_path, &lt;span style="color:#87ceeb">&amp;#39;w&amp;#39;&lt;/span>) &lt;span style="color:#f00">as&lt;/span> f:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> entry in data:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> f.write(json.dumps(entry) + &lt;span style="color:#87ceeb">&amp;#39;&lt;/span>&lt;span style="color:#87ceeb">\n&lt;/span>&lt;span style="color:#87ceeb">&amp;#39;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> @property
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">extensions&lt;/span>(self):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> [&lt;span style="color:#87ceeb">&amp;#34;.log&amp;#34;&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> @property
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">priority&lt;/span>(self):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f60">15&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Register the handler&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>store.register_handler(CustomLogHandler())
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="file-operations">File Operations&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Check existence&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">if&lt;/span> &lt;span style="color:#87ceeb">&amp;#34;config&amp;#34;&lt;/span> in store:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#87ceeb">&amp;#34;Config file exists&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># List all files&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">for&lt;/span> key in store.keys():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> print(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;File: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>key&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Get file metadata&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>metadata = store.get_metadata(&lt;span style="color:#87ceeb">&amp;#34;large_dataset&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>print(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Size: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>metadata.size&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> bytes&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>print(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Modified: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>metadata.modified_time&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Copy and move files&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>store.copy(&lt;span style="color:#87ceeb">&amp;#34;original&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;backup&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>store.move(&lt;span style="color:#87ceeb">&amp;#34;temp_file&amp;#34;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;permanent_file&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Delete files&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">del&lt;/span> store[&lt;span style="color:#87ceeb">&amp;#34;old_data&amp;#34;&lt;/span>]
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="context-management">Context Management&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Automatic cleanup of temporary files and cache&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">with&lt;/span> AutoStore(&lt;span style="color:#87ceeb">&amp;#34;./data&amp;#34;&lt;/span>, config=config) &lt;span style="color:#f00">as&lt;/span> store:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> store[&lt;span style="color:#87ceeb">&amp;#34;data&amp;#34;&lt;/span>] = large_dataset
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> results = store[&lt;span style="color:#87ceeb">&amp;#34;data&amp;#34;&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Temporary files are automatically cleaned up here&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="multiple-storage-backends">Multiple Storage Backends&lt;/h2>
&lt;p>AutoStore supports pluggable storage backends:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Local storage&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>local_store = AutoStore(&lt;span style="color:#87ceeb">&amp;#34;./data&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># S3 storage&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>s3_store = AutoStore(&lt;span style="color:#87ceeb">&amp;#34;s3://bucket/prefix/&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="performance-considerations">Performance Considerations&lt;/h2>
&lt;h3 id="large-file-handling">Large File Handling&lt;/h3>
&lt;p>AutoStore automatically optimizes for large files:&lt;/p>
&lt;ul>
&lt;li>Multipart uploads/downloads for files &amp;gt; 64MB&lt;/li>
&lt;li>Configurable chunk sizes and concurrency&lt;/li>
&lt;li>Streaming operations to minimize memory usage&lt;/li>
&lt;/ul>
&lt;h2 id="when-to-use-autostore">When to Use AutoStore&lt;/h2>
&lt;p>Choose AutoStore when you need:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Data science projects&lt;/strong> with mixed file types and cloud storage&lt;/li>
&lt;li>&lt;strong>Building data pipelines&lt;/strong> with heterogeneous data sources&lt;/li>
&lt;li>&lt;strong>Rapid prototyping&lt;/strong> where you don&amp;rsquo;t want to think about file formats&lt;/li>
&lt;li>&lt;strong>Consistent data access patterns&lt;/strong> across local and cloud environments&lt;/li>
&lt;li>&lt;strong>Performance optimization&lt;/strong> through intelligent caching&lt;/li>
&lt;li>&lt;strong>Easy extensibility&lt;/strong> for custom data types and storage backends&lt;/li>
&lt;li>&lt;strong>Type-safe configuration&lt;/strong> with dataclass-based settings&lt;/li>
&lt;/ul>
&lt;p>Don&amp;rsquo;t choose AutoStore when:&lt;/p>
&lt;ul>
&lt;li>You need complex queries (use TinyDB or databases)&lt;/li>
&lt;li>You only work with one data type consistently&lt;/li>
&lt;li>You need zero dependencies (use Shelve)&lt;/li>
&lt;li>You require advanced database features&lt;/li>
&lt;/ul>
&lt;h2 id="comparison-with-alternatives">Comparison with Alternatives&lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Feature&lt;/th>
&lt;th>AutoStore&lt;/th>
&lt;th>Shelve&lt;/th>
&lt;th>DiskCache&lt;/th>
&lt;th>TinyDB&lt;/th>
&lt;th>PickleDB&lt;/th>
&lt;th>SQLiteDict&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>Multi-format Support&lt;/strong>&lt;/td>
&lt;td>✅ 12+ formats&lt;/td>
&lt;td>❌ Pickle only&lt;/td>
&lt;td>❌ Pickle only&lt;/td>
&lt;td>❌ JSON only&lt;/td>
&lt;td>❌ JSON only&lt;/td>
&lt;td>❌ Pickle only&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Auto Format Detection&lt;/strong>&lt;/td>
&lt;td>✅ Smart inference&lt;/td>
&lt;td>❌ Manual&lt;/td>
&lt;td>❌ Manual&lt;/td>
&lt;td>❌ Manual&lt;/td>
&lt;td>❌ Manual&lt;/td>
&lt;td>❌ Manual&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Cloud Storage&lt;/strong>&lt;/td>
&lt;td>✅ S3, extensible&lt;/td>
&lt;td>❌ Local only&lt;/td>
&lt;td>❌ Local only&lt;/td>
&lt;td>❌ Local only&lt;/td>
&lt;td>❌ Local only&lt;/td>
&lt;td>❌ Local only&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Intelligent Caching&lt;/strong>&lt;/td>
&lt;td>✅ ETag-based&lt;/td>
&lt;td>❌ None&lt;/td>
&lt;td>✅ Advanced&lt;/td>
&lt;td>❌ None&lt;/td>
&lt;td>❌ None&lt;/td>
&lt;td>❌ None&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Type-Safe Config&lt;/strong>&lt;/td>
&lt;td>✅ Dataclasses&lt;/td>
&lt;td>❌ None&lt;/td>
&lt;td>✅ Classes&lt;/td>
&lt;td>❌ Dicts&lt;/td>
&lt;td>❌ None&lt;/td>
&lt;td>❌ None&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Large File Handling&lt;/strong>&lt;/td>
&lt;td>✅ Multipart&lt;/td>
&lt;td>❌ Limited&lt;/td>
&lt;td>✅ Good&lt;/td>
&lt;td>❌ Limited&lt;/td>
&lt;td>❌ Limited&lt;/td>
&lt;td>❌ Limited&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Extensibility&lt;/strong>&lt;/td>
&lt;td>✅ Handler system&lt;/td>
&lt;td>❌ Limited&lt;/td>
&lt;td>❌ Limited&lt;/td>
&lt;td>✅ Middleware&lt;/td>
&lt;td>❌ Limited&lt;/td>
&lt;td>❌ Limited&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Performance&lt;/strong>&lt;/td>
&lt;td>✅ Cached/Optimized&lt;/td>
&lt;td>🔶 Medium&lt;/td>
&lt;td>✅ Fast&lt;/td>
&lt;td>🔶 Medium&lt;/td>
&lt;td>🔶 Medium&lt;/td>
&lt;td>🔶 Medium&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Standard Library&lt;/strong>&lt;/td>
&lt;td>❌ External&lt;/td>
&lt;td>✅ Built-in&lt;/td>
&lt;td>❌ External&lt;/td>
&lt;td>❌ External&lt;/td>
&lt;td>❌ External&lt;/td>
&lt;td>❌ External&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table></description></item><item><title>Presskit: Database-driven static site generator</title><link>https://asifr.com/presskit/</link><pubDate>Sat, 07 Jun 2025 00:00:00 +0000</pubDate><guid>https://asifr.com/presskit/</guid><description>
&lt;p>Nearly all static site generators convert Markdown to HTML but aren&amp;rsquo;t good at generating multiple pages from database queries. Ideally we would store data in SQLite or Postgres databases and the static site generator would run queries against the database to create a page for each row in the result. Presskit was invented to do just that.&lt;/p>
&lt;p>&lt;a href="https://github.com/asifr/presskit">Presskit&lt;/a> is a powerful static site generator that combines Markdown content with Jinja2 templating and database-driven page generation. Presskit lets you build dynamic static sites by connecting your content to SQLite databases and JSON data sources.&lt;/p>
&lt;p>Table of contents:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="#key-features">Key Features&lt;/a>&lt;/li>
&lt;li>&lt;a href="#installation">Installation&lt;/a>&lt;/li>
&lt;li>&lt;a href="#quick-start">Quick Start&lt;/a>&lt;/li>
&lt;li>&lt;a href="#basic-usage">Basic Usage&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#writing-markdown-content">Writing Markdown Content&lt;/a>&lt;/li>
&lt;li>&lt;a href="#creating-html-templates">Creating HTML Templates&lt;/a>&lt;/li>
&lt;li>&lt;a href="#configuration">Configuration&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#template-variables">Template Variables&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#site-variables-site">Site Variables (&lt;code>site.*&lt;/code>)&lt;/a>&lt;/li>
&lt;li>&lt;a href="#build-variables-build">Build Variables (&lt;code>build.*&lt;/code>)&lt;/a>&lt;/li>
&lt;li>&lt;a href="#page-variables-page">Page Variables (&lt;code>page.*&lt;/code>)&lt;/a>&lt;/li>
&lt;li>&lt;a href="#data-variables-data">Data Variables (&lt;code>data.*&lt;/code>)&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#using-variables-in-markdown">Using Variables in Markdown&lt;/a>&lt;/li>
&lt;li>&lt;a href="#data-sources-and-queries">Data Sources and Queries&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#configuring-data-sources">Configuring Data Sources&lt;/a>&lt;/li>
&lt;li>&lt;a href="#adding-queries">Adding Queries&lt;/a>&lt;/li>
&lt;li>&lt;a href="#using-query-data-in-templates">Using Query Data in Templates&lt;/a>&lt;/li>
&lt;li>&lt;a href="#page-level-queries">Page-Level Queries&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#generating-pages">Generating Pages&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#generator-queries">Generator Queries&lt;/a>&lt;/li>
&lt;li>&lt;a href="#generator-configuration">Generator Configuration&lt;/a>&lt;/li>
&lt;li>&lt;a href="#creating-generator-templates">Creating Generator Templates&lt;/a>&lt;/li>
&lt;li>&lt;a href="#nested-queries">Nested Queries&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#commands">Commands&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#build-commands">Build Commands&lt;/a>&lt;/li>
&lt;li>&lt;a href="#development">Development&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;a href="#advanced-configuration">Advanced Configuration&lt;/a>
&lt;ul>
&lt;li>&lt;a href="#full-configuration-example">Full Configuration Example&lt;/a>&lt;/li>
&lt;li>&lt;a href="#custom-filters">Custom Filters&lt;/a>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="key-features">Key Features&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Jinja2 Templating&lt;/strong>: Use Jinja2 variables and logic in both Markdown content and HTML layouts&lt;/li>
&lt;li>&lt;strong>Database Integration&lt;/strong>: Load data from SQLite databases and JSON files&lt;/li>
&lt;li>&lt;strong>Dynamic Page Generation&lt;/strong>: Generate multiple pages automatically from SQLite query results&lt;/li>
&lt;li>&lt;strong>Structured Context&lt;/strong>: Access site metadata, build information, and data through a clean template context&lt;/li>
&lt;/ul>
&lt;h2 id="installation">Installation&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>pip install presskit
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Or you can use &lt;a href="https://docs.astral.sh/uv/">Astral&amp;rsquo;s uv&lt;/a> Python package manager to install Presskit as a self-contained tool so it can be run from the command line without needing to activate a virtual environment:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>uv tool install presskit
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="quick-start">Quick Start&lt;/h2>
&lt;ol>
&lt;li>Create a new site directory:&lt;/li>
&lt;/ol>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>mkdir my-site
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cd my-site
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;ol start="2">
&lt;li>Create the basic structure:&lt;/li>
&lt;/ol>
&lt;pre tabindex="0">&lt;code>my-site/
├── presskit.json # Configuration file
├── content/ # Markdown files
├── templates/ # HTML templates
└── public/ # Generated output (created automatically)
&lt;/code>&lt;/pre>&lt;ol start="3">
&lt;li>Build your site:&lt;/li>
&lt;/ol>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>presskit build
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="basic-usage">Basic Usage&lt;/h2>
&lt;h3 id="writing-markdown-content">Writing Markdown Content&lt;/h3>
&lt;p>Create Markdown files in the &lt;code>content/&lt;/code> directory. Each file can include YAML front matter for metadata:&lt;/p>
&lt;pre tabindex="0">&lt;code>---
title: &amp;#34;Welcome to My Site&amp;#34;
description: &amp;#34;A brief introduction&amp;#34;
layout: page
---
# Welcome
This is my **awesome** site built with Presskit!
&lt;/code>&lt;/pre>&lt;h3 id="creating-html-templates">Creating HTML Templates&lt;/h3>
&lt;p>Templates go in the &lt;code>templates/&lt;/code> directory. Here&amp;rsquo;s a basic &lt;code>page.html&lt;/code> template:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-html" data-lang="html">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e5e5e5">&amp;lt;!DOCTYPE html&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;html lang=&lt;span style="color:#87ceeb">&amp;#34;{{ site.language }}&amp;#34;&lt;/span>&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;head&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;meta charset=&lt;span style="color:#87ceeb">&amp;#34;UTF-8&amp;#34;&lt;/span>&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;title&amp;gt;{{ page.title or site.title }}&amp;lt;/title&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;meta name=&lt;span style="color:#87ceeb">&amp;#34;description&amp;#34;&lt;/span> content=&lt;span style="color:#87ceeb">&amp;#34;{{ page.description or site.description }}&amp;#34;&lt;/span>&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;/head&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;body&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;header&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;h1&amp;gt;{{ site.title }}&amp;lt;/h1&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;/header&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;main&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {{ page.content }}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;/main&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;footer&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;p&amp;gt;&amp;amp;copy; {{ build.year }} {{ site.author }}&amp;lt;/p&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;/footer&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;/body&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;/html&amp;gt;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="configuration">Configuration&lt;/h3>
&lt;p>Create a &lt;code>presskit.json&lt;/code> file to configure your site:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-json" data-lang="json">&lt;span style="display:flex;">&lt;span>{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;title&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;My Awesome Site&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;description&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;A site built with Presskit&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;author&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;Your Name&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;url&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;https://mysite.com&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;language&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;en-US&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="template-variables">Template Variables&lt;/h2>
&lt;p>Presskit provides a structured context with the following variables available in all templates:&lt;/p>
&lt;h3 id="site-variables-site">Site Variables (&lt;code>site.*&lt;/code>)&lt;/h3>
&lt;ul>
&lt;li>&lt;code>site.title&lt;/code> - Site title&lt;/li>
&lt;li>&lt;code>site.description&lt;/code> - Site description&lt;/li>
&lt;li>&lt;code>site.author&lt;/code> - Site author&lt;/li>
&lt;li>&lt;code>site.url&lt;/code> - Base site URL&lt;/li>
&lt;li>&lt;code>site.version&lt;/code> - Site version&lt;/li>
&lt;li>&lt;code>site.language&lt;/code> - Site language&lt;/li>
&lt;/ul>
&lt;h3 id="build-variables-build">Build Variables (&lt;code>build.*&lt;/code>)&lt;/h3>
&lt;ul>
&lt;li>&lt;code>build.date&lt;/code> - Build date (YYYY-MM-DD)&lt;/li>
&lt;li>&lt;code>build.year&lt;/code> - Build year&lt;/li>
&lt;li>&lt;code>build.timestamp&lt;/code> - Full build timestamp&lt;/li>
&lt;li>&lt;code>build.iso_date&lt;/code> - Build date in ISO format&lt;/li>
&lt;/ul>
&lt;h3 id="page-variables-page">Page Variables (&lt;code>page.*&lt;/code>)&lt;/h3>
&lt;ul>
&lt;li>&lt;code>page.filename&lt;/code> - Page filename without extension&lt;/li>
&lt;li>&lt;code>page.filepath&lt;/code> - Full file path&lt;/li>
&lt;li>&lt;code>page.path&lt;/code> - Clean URL path&lt;/li>
&lt;li>&lt;code>page.layout&lt;/code> - Template layout name&lt;/li>
&lt;li>&lt;code>page.content&lt;/code> - Processed HTML content (in templates)&lt;/li>
&lt;li>&lt;code>page.title&lt;/code> - Page title from front matter&lt;/li>
&lt;li>&lt;code>page.description&lt;/code> - Page description from front matter&lt;/li>
&lt;/ul>
&lt;h3 id="data-variables-data">Data Variables (&lt;code>data.*&lt;/code>)&lt;/h3>
&lt;ul>
&lt;li>&lt;code>data.queries&lt;/code> - Results from named queries&lt;/li>
&lt;li>&lt;code>data.sources&lt;/code> - JSON data sources&lt;/li>
&lt;li>&lt;code>data.page_queries&lt;/code> - Page-specific query results&lt;/li>
&lt;/ul>
&lt;p>Plus any custom variables from your front matter are available at the top level.&lt;/p>
&lt;h2 id="using-variables-in-markdown">Using Variables in Markdown&lt;/h2>
&lt;p>You can use Jinja2 templating directly in your Markdown content:&lt;/p>
&lt;pre tabindex="0">&lt;code>---
title: About
category: personal
---
# About {{ site.author }}
This site was built on {{ build.date }} and is currently version {{ site.version }}.
{% if category == &amp;#34;personal&amp;#34; %}
This is a personal page about {{ site.author }}.
{% endif %}
&lt;/code>&lt;/pre>&lt;h2 id="data-sources-and-queries">Data Sources and Queries&lt;/h2>
&lt;p>Presskit&amp;rsquo;s data integration feature allows you to connect your static site to data sources, enabling content generation while maintaining the performance benefits of static sites. This powerful feature bridges the gap between static and dynamic websites.&lt;/p>
&lt;p>This enables data-driven pages that display statistics, reports, or any structured data. Ideal for portfolios showcasing project metrics, business dashboards, or documentation sites pulling from APIs.&lt;/p>
&lt;p>This encourages separation of concerns where you keep your content in databases where it can be easily edited, queried, and managed, while your site structure remains in version control.&lt;/p>
&lt;h3 id="configuring-data-sources">Configuring Data Sources&lt;/h3>
&lt;p>Add data sources to your &lt;code>presskit.json&lt;/code>:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-json" data-lang="json">&lt;span style="display:flex;">&lt;span>{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;title&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;My Blog&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;sources&amp;#34;: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;blog_db&amp;#34;: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;type&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;sqlite&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;path&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;data/blog.db&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;config&amp;#34;: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;type&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;json&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;path&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;data/site-config.json&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;default_source&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;blog_db&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="adding-queries">Adding Queries&lt;/h3>
&lt;p>Define queries to load data from your sources:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-json" data-lang="json">&lt;span style="display:flex;">&lt;span>{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;sources&amp;#34;: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;blog_db&amp;#34;: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;type&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;sqlite&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;path&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;data/blog.db&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;queries&amp;#34;: [
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;name&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;recent_posts&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;source&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;blog_db&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;query&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;SELECT title, slug, date, excerpt FROM posts ORDER BY date DESC LIMIT 5&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;name&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;categories&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;source&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;blog_db&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;query&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;SELECT name, slug, COUNT(*) as post_count FROM categories JOIN posts ON categories.id = posts.category_id GROUP BY categories.id&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="using-query-data-in-templates">Using Query Data in Templates&lt;/h3>
&lt;p>Access query results through the &lt;code>data.queries&lt;/code> object:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-html" data-lang="html">&lt;span style="display:flex;">&lt;span>&amp;lt;section class=&lt;span style="color:#87ceeb">&amp;#34;recent-posts&amp;#34;&lt;/span>&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;h2&amp;gt;Recent Posts&amp;lt;/h2&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {% for post in data.queries.recent_posts %}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;article&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;h3&amp;gt;&amp;lt;a href=&lt;span style="color:#87ceeb">&amp;#34;/posts/{{ post.slug }}&amp;#34;&lt;/span>&amp;gt;{{ post.title }}&amp;lt;/a&amp;gt;&amp;lt;/h3&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;time&amp;gt;{{ post.date | date_format(&amp;#39;%B %d, %Y&amp;#39;) }}&amp;lt;/time&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;p&amp;gt;{{ post.excerpt }}&amp;lt;/p&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;/article&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {% endfor %}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;/section&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;aside class=&lt;span style="color:#87ceeb">&amp;#34;categories&amp;#34;&lt;/span>&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;h3&amp;gt;Categories&amp;lt;/h3&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;ul&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {% for category in data.queries.categories %}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;li&amp;gt;&amp;lt;a href=&lt;span style="color:#87ceeb">&amp;#34;/category/{{ category.slug }}&amp;#34;&lt;/span>&amp;gt;{{ category.name }} ({{ category.post_count }})&amp;lt;/a&amp;gt;&amp;lt;/li&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {% endfor %}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;/ul&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;/aside&amp;gt;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="page-level-queries">Page-Level Queries&lt;/h3>
&lt;p>You can also define queries in individual Markdown files:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-markdown" data-lang="markdown">&lt;span style="display:flex;">&lt;span>---
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>title: &amp;#34;Author Profile&amp;#34;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>queries:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> author_posts:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> source: &amp;#34;blog_db&amp;#34;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> query: &amp;#34;SELECT title, slug, date FROM posts WHERE author_id = {{ author_id }} ORDER BY date DESC&amp;#34;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>variables:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> author_id: 123
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>---
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#ff0;font-weight:bold"># {{ author.name }}
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#ff0;font-weight:bold">&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb;font-weight:bold">## Recent Posts by This Author
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb;font-weight:bold">&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>{% for post in data.page_queries.author_posts %}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">-&lt;/span> [{{ post.title }}](/posts/{{ post.slug }}) - {{ post.date | date_format(&amp;#39;%Y-%m-%d&amp;#39;) }}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>{% endfor %}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The above example shows how to define a query that fetches posts by a specific author using the &lt;code>author_id&lt;/code> variable.&lt;/p>
&lt;h2 id="generating-pages">Generating Pages&lt;/h2>
&lt;p>The most powerful feature of Presskit is generating multiple pages from database queries.&lt;/p>
&lt;h3 id="generator-queries">Generator Queries&lt;/h3>
&lt;p>Mark a query as a generator to create multiple pages:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-json" data-lang="json">&lt;span style="display:flex;">&lt;span>{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;queries&amp;#34;: [
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;name&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;blog_posts&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;source&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;blog_db&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;query&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;SELECT title, slug, content, date, author FROM posts WHERE published = 1&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;generator&amp;#34;: &lt;span style="color:#f00">true&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;template&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;post&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;output_path&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;posts/#{slug}&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="generator-configuration">Generator Configuration&lt;/h3>
&lt;ul>
&lt;li>&lt;code>generator: true&lt;/code> - Marks this as a page generator&lt;/li>
&lt;li>&lt;code>template&lt;/code> - Template to use for generated pages&lt;/li>
&lt;li>&lt;code>output_path&lt;/code> - Path pattern with placeholders like &lt;code>#{field_name}&lt;/code>&lt;/li>
&lt;/ul>
&lt;h3 id="creating-generator-templates">Creating Generator Templates&lt;/h3>
&lt;p>Create a template for your generated pages (&lt;code>templates/post.html&lt;/code>):&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-html" data-lang="html">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e5e5e5">&amp;lt;!DOCTYPE html&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;html&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;head&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;title&amp;gt;{{ title }} | {{ site.title }}&amp;lt;/title&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;/head&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;body&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;article&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;h1&amp;gt;{{ title }}&amp;lt;/h1&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;time&amp;gt;{{ date | date_format(&amp;#39;%B %d, %Y&amp;#39;) }}&amp;lt;/time&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;div class=&lt;span style="color:#87ceeb">&amp;#34;content&amp;#34;&lt;/span>&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {{ content | safe }}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;/div&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;p&amp;gt;By {{ author }}&amp;lt;/p&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;/article&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;nav&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;a href=&lt;span style="color:#87ceeb">&amp;#34;/&amp;#34;&lt;/span>&amp;gt;← Back to Home&amp;lt;/a&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;/nav&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;/body&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;/html&amp;gt;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="nested-queries">Nested Queries&lt;/h3>
&lt;p>You can create parent-child query relationships:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-json" data-lang="json">&lt;span style="display:flex;">&lt;span>{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;queries&amp;#34;: [
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;name&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;authors&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;source&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;blog_db&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;query&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;SELECT id, name, bio, slug FROM authors&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;name&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;authors.posts&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;source&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;blog_db&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;query&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;SELECT title, slug, date FROM posts WHERE author_id = {{ id }} ORDER BY date DESC&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Access nested data in templates:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-html" data-lang="html">&lt;span style="display:flex;">&lt;span>{% for author in data.queries.authors %}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;div class=&lt;span style="color:#87ceeb">&amp;#34;author&amp;#34;&lt;/span>&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;h2&amp;gt;{{ author.name }}&amp;lt;/h2&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;p&amp;gt;{{ author.bio }}&amp;lt;/p&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;h3&amp;gt;Posts by {{ author.name }}&amp;lt;/h3&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {% for post in author.posts %}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;p&amp;gt;&amp;lt;a href=&lt;span style="color:#87ceeb">&amp;#34;/posts/{{ post.slug }}&amp;#34;&lt;/span>&amp;gt;{{ post.title }}&amp;lt;/a&amp;gt; - {{ post.date }}&amp;lt;/p&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {% endfor %}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;/div&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>{% endfor %}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="commands">Commands&lt;/h2>
&lt;h3 id="build-commands">Build Commands&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Build entire site&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>presskit build
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Build specific file&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>presskit build content/about.md
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Execute queries and cache results&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>presskit data
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Generate pages from generator queries &lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>presskit generate
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Check query cache status&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>presskit status
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="development">Development&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Start development server&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>presskit server
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Clean build artifacts&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>presskit clean
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="advanced-configuration">Advanced Configuration&lt;/h2>
&lt;h3 id="full-configuration-example">Full Configuration Example&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-json" data-lang="json">&lt;span style="display:flex;">&lt;span>{
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;title&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;My Blog&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;description&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;A blog about web development&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;author&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;Jane Developer&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;url&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;https://myblog.dev&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;version&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;2.1.0&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;language&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;en-US&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;content_dir&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;content&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;templates_dir&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;templates&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;output_dir&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;public&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;cache_dir&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;.cache&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;default_template&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;page&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;markdown_extension&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;md&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;workers&amp;#34;: &lt;span style="color:#f60">8&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;server_host&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;0.0.0.0&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;server_port&amp;#34;: &lt;span style="color:#f60">8000&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;sources&amp;#34;: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;blog_db&amp;#34;: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;type&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;sqlite&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;path&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;data/blog.sqlite3&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;config&amp;#34;: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;type&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;json&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;path&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;data/config.json&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;default_source&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;blog_db&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;variables&amp;#34;: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;environment&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;production&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;analytics_id&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;GA-XXXXX&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;queries&amp;#34;: [
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;name&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;posts&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;source&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;blog_db&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;query&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;SELECT * FROM posts WHERE status = &amp;#39;published&amp;#39; ORDER BY date DESC&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;generator&amp;#34;: &lt;span style="color:#f00">true&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;template&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;post&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;output_path&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;blog/#{slug}&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;name&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;recent_posts&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;source&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;blog_db&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;#34;query&amp;#34;: &lt;span style="color:#87ceeb">&amp;#34;SELECT title, slug, excerpt, date FROM posts WHERE status = &amp;#39;published&amp;#39; ORDER BY date DESC LIMIT 5&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="custom-filters">Custom Filters&lt;/h3>
&lt;p>Presskit includes useful Jinja2 filters:&lt;/p>
&lt;ul>
&lt;li>&lt;code>date_format(format)&lt;/code>&lt;br>
Format dates (e.g., &lt;code>{{ date | date_format('%B %d, %Y') }}&lt;/code>)&lt;/li>
&lt;/ul></description></item><item><title>Google Chrome On-Device Embedding Model</title><link>https://asifr.com/chrome-on-device-embedding-model/</link><pubDate>Tue, 13 May 2025 00:00:00 +0000</pubDate><guid>https://asifr.com/chrome-on-device-embedding-model/</guid><description>
&lt;p>Google Chrome bundles a text embedding model used to cluster browsing history as part of the &lt;a href="https://privacysandbox.google.com/private-advertising/topics/web">Topics API&lt;/a> and for semantic search. They also ship a number of other models with Chrome.&lt;/p>
&lt;p>First I had to track down these on-device models. I started at the usual place where apps store their application data in the &lt;code>~/Library/Application Support/&lt;/code> folder on macOS. Searching for &lt;code>find ~/Library/Application\ Support/ -maxdepth 4 'optimization*'&lt;/code> returns the the folder I was looking for: &lt;code>~/Library/Application Support/Google/Chrome/optimization_guide_model_store&lt;/code>. The &lt;a href="https://chromium.googlesource.com/chromium/src/+/refs/heads/main/components/optimization_guide">Chromium source code&lt;/a> says this about optimization guide: &amp;ldquo;The optimization guide component contains code for processing hints and machine learning models received from the remote Chrome Optimization Guide Service&amp;rdquo;.&lt;/p>
&lt;p>Models are stored as &lt;a href="https://ai.google.dev/edge/litert">tflite&lt;/a> files, which is rebranded as LiteRT:&lt;/p>
&lt;blockquote>
&lt;p>LiteRT (short for Lite Runtime), formerly known as TensorFlow Lite, is Google&amp;rsquo;s high-performance runtime for on-device AI. LiteRT is also the format Google uses to ship their Gemma3N edge-device models.&lt;/p>&lt;/blockquote>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-console" data-lang="console">&lt;span style="display:flex;">&lt;span>$ cd ~/Library/Application&lt;span style="color:#87ceeb">\ &lt;/span>Support/Google/Chrome/optimization_guide_model_store
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>$ tree -L &lt;span style="color:#f60">4&lt;/span> .
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>.
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├── 13
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── E6DC4029A1E4B4C1
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── 205CA176C885321E
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   ├── model-info.pb
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── model.tflite
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├── 15
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── E6DC4029A1E4B4C1
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── 255B83C178FA9DD9
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   ├── model-info.pb
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   ├── model.tflite
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   ├── override_list.pb.gz
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── VERSION.txt
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├── 2
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── E6DC4029A1E4B4C1
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── 2DFDB6405E512759
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   ├── model-info.pb
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── model.tflite
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├── 20
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── E6DC4029A1E4B4C1
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── 91EF641BEE15B40C
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   ├── model-info.pb
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── model.tflite
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├── 24
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── E6DC4029A1E4B4C1
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── 1B8C0D25285420AB
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   ├── enus_denylist_encoded_241007.txt
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   ├── model-info.pb
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   ├── model.tflite
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── vocab_en-us.txt
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├── 25
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── E6DC4029A1E4B4C1
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── C278361C6A5A6107
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   ├── model-info.pb
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   ├── model.tflite
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── visual_model_desktop.tflite
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├── 26
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── E6DC4029A1E4B4C1
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── 141FCE0CF6807549
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   ├── model-info.pb
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── model.tflite
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├── 43
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── E6DC4029A1E4B4C1
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── E234446CB5BACE99
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   ├── model-info.pb
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   ├── model.tflite
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── sentencepiece.model
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├── 45
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── E6DC4029A1E4B4C1
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── 063B3FABDDE10CE8
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   ├── model-info.pb
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   └── model.tflite
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>└── 9
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> └── E6DC4029A1E4B4C1
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> └── B5ECF67C32B2BD47
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ├── model-info.pb
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> └── model.tflite
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>31 directories, 26 files
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>One of the models (&lt;code>255B83C178FA9DD9&lt;/code>) is the &amp;ldquo;&lt;a href="https://privacysandbox.com/intl/en_us/proposals/topics/">Browsing Topics Privacy Sandbox feature&lt;/a>&amp;rdquo; which maps recent browsing history to a set of interest-based categories to serve relevant ads. The Topics API is a replacement for the &lt;a href="https://github.com/google/ads-privacy/blob/master/proposals/FLoC/FLOC-Whitepaper-Google.pdf">FLOC proposal&lt;/a>. The version.txt refers to a taxonomy, which presumably is related to the interest-based categories.&lt;/p>
&lt;p>The visual_model_desktop.tflite file appears to be part of a phishing classifier based on the &lt;a href="https://chromium.googlesource.com/chromium/src/+/refs/heads/main/chrome/renderer/safe_browsing/phishing_classifier_browsertest.cc#73">Chromium source code&lt;/a>.&lt;/p>
&lt;p>There is a sentencepiece.model file which is a &lt;a href="https://github.com/google/sentencepiece">text-tokenizer&lt;/a>. The accompanying model.tflite file is the largest at 107 MB.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-console" data-lang="console">&lt;span style="display:flex;">&lt;span>$ find . -type f -name &lt;span style="color:#87ceeb">&amp;#34;*model.tflite&amp;#34;&lt;/span> -print0 | xargs -0 du -h
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>64K ./20/E6DC4029A1E4B4C1/91EF641BEE15B40C/model.tflite
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>16K ./9/E6DC4029A1E4B4C1/B5ECF67C32B2BD47/model.tflite
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>132K ./45/E6DC4029A1E4B4C1/063B3FABDDE10CE8/model.tflite
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>8.0K ./26/E6DC4029A1E4B4C1/141FCE0CF6807549/model.tflite
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>107M ./43/E6DC4029A1E4B4C1/E234446CB5BACE99/model.tflite &amp;lt;---
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>4.4M ./24/E6DC4029A1E4B4C1/1B8C0D25285420AB/model.tflite
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>2.6M ./15/E6DC4029A1E4B4C1/255B83C178FA9DD9/model.tflite
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>384K ./2/E6DC4029A1E4B4C1/2DFDB6405E512759/model.tflite
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>1.2M ./13/E6DC4029A1E4B4C1/205CA176C885321E/model.tflite
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>180K ./25/E6DC4029A1E4B4C1/C278361C6A5A6107/model.tflite
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Let&amp;rsquo;s load up the sentencepiece tokenizer and the model and get embeddings.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> numpy &lt;span style="color:#f00">as&lt;/span> np
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> tensorflow &lt;span style="color:#f00">as&lt;/span> tf
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> sentencepiece &lt;span style="color:#f00">as&lt;/span> spm
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>tokenizer = spm.SentencePieceProcessor(model_file=&lt;span style="color:#87ceeb">&amp;#39;sentencepiece.model&amp;#39;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>interpreter = tf.lite.Interpreter(model_path=&lt;span style="color:#87ceeb">&amp;#39;model.tflite&amp;#39;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>interpreter.allocate_tensors()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>input_details = interpreter.get_input_details()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>output_details = interpreter.get_output_details()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">get_embedding&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> interpreter: tf.lite.Interpreter,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> tokenizer: spm.SentencePieceProcessor,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> text: str,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; np.ndarray:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Embedding vector for a given text. Max token length is 64.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> interpreter: TFLite interpreter.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> tokenizer: SentencePiece tokenizer.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> text: Text to be tokenized and embedded.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> np.ndarray: Embedding vector of shape (768,).
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Example:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;gt;&amp;gt;&amp;gt; tokenizer = spm.SentencePieceProcessor(model_file=&amp;#34;sentencepiece.model&amp;#34;)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;gt;&amp;gt;&amp;gt; interpreter = tf.lite.Interpreter(model_path=&amp;#34;model.tflite&amp;#34;)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;gt;&amp;gt;&amp;gt; get_embedding(interpreter, tokenizer, &amp;#34;New York&amp;#34;) # (768,)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> input_shape = input_details[&lt;span style="color:#f60">0&lt;/span>][&lt;span style="color:#87ceeb">&amp;#34;shape&amp;#34;&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> seq_len = input_shape[&lt;span style="color:#f60">1&lt;/span>] &lt;span style="color:#f00">if&lt;/span> len(input_shape) &amp;gt; &lt;span style="color:#f60">1&lt;/span> &lt;span style="color:#f00">else&lt;/span> &lt;span style="color:#f60">64&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> tokens = tokenizer.encode(text, out_type=int)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> tokens = np.pad(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> tokens[:seq_len],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> (&lt;span style="color:#f60">0&lt;/span>, max(&lt;span style="color:#f60">0&lt;/span>, seq_len - len(tokens))),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;constant&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )[np.newaxis, :] &lt;span style="color:#0f0"># (1, 64)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> interpreter.set_tensor(input_details[&lt;span style="color:#f60">0&lt;/span>][&lt;span style="color:#87ceeb">&amp;#34;index&amp;#34;&lt;/span>], tokens.astype(np.int32))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> interpreter.invoke()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> embedding = interpreter.get_tensor(output_details[&lt;span style="color:#f60">0&lt;/span>][&lt;span style="color:#87ceeb">&amp;#34;index&amp;#34;&lt;/span>]) &lt;span style="color:#0f0"># (1, 768)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> embedding[&lt;span style="color:#f60">0&lt;/span>] &lt;span style="color:#0f0"># (768,)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>text = &lt;span style="color:#87ceeb">&amp;#34;New York&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>embedding = get_embedding(interpreter, tokenizer, text)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># [-2.49071661e-02 2.41535041e-03 -1.51733104e-02 -1.12882648e-02...]&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This model has a max sequence length of 64 inpt tokens and outputs a 768 dimension vector. That is a small input size of around 200-220 characters.&lt;/p></description></item><item><title>Automatic statistics selection</title><link>https://asifr.com/automatic-statistics-selection/</link><pubDate>Sat, 10 May 2025 00:00:00 +0000</pubDate><guid>https://asifr.com/automatic-statistics-selection/</guid><description>
&lt;p>Given a dataset with mixed variable types, how do you decide which statistical test to run? We can reduce this down to three things: the type of the variable being tested (numeric or categorical), the type of the covariate (numeric, categorical, or none), and how many categories the covariate has.&lt;/p>
&lt;p>The following decision table maps these inputs to the appropriate test. Each function takes a Polars DataFrame, runs the test, and returns a structured result with a test statistic, critical value, p-value, and a plain-language conclusion.&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Variable&lt;/th>
&lt;th>Covariate&lt;/th>
&lt;th>NumCategories&lt;/th>
&lt;th>Type&lt;/th>
&lt;th>Test&lt;/th>
&lt;th>StatementExample&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Numeric&lt;/td>
&lt;td>Categorical&lt;/td>
&lt;td>&amp;gt;2&lt;/td>
&lt;td>Medians&lt;/td>
&lt;td>Kruskal-Wallis&lt;/td>
&lt;td>&amp;ldquo;Median &lt;variable> does not vary across values of &lt;covariate>.&amp;rdquo;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Numeric&lt;/td>
&lt;td>Categorical&lt;/td>
&lt;td>2&lt;/td>
&lt;td>Medians&lt;/td>
&lt;td>Mann-Whitney Test&lt;/td>
&lt;td>&amp;ldquo;Median &lt;variable> varies across values of &lt;covariate>.&amp;rdquo;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Numeric&lt;/td>
&lt;td>Categorical&lt;/td>
&lt;td>&amp;gt;2&lt;/td>
&lt;td>Means&lt;/td>
&lt;td>ANOVA&lt;/td>
&lt;td>&amp;ldquo;Mean &lt;variable> does not vary across values of &lt;covariate>.&amp;rdquo;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Numeric&lt;/td>
&lt;td>Categorical&lt;/td>
&lt;td>2&lt;/td>
&lt;td>Means&lt;/td>
&lt;td>T-test&lt;/td>
&lt;td>&amp;ldquo;Mean &lt;variable> varies across values of &lt;covariate>.&amp;rdquo;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Categorical&lt;/td>
&lt;td>Numeric&lt;/td>
&lt;td>2+&lt;/td>
&lt;td>Chi-squared&lt;/td>
&lt;td>Chi-squared test&lt;/td>
&lt;td>&amp;ldquo;&lt;variable> and quantile of &lt;covariate> are not independent&amp;rdquo;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Numeric&lt;/td>
&lt;td>None&lt;/td>
&lt;td>1&lt;/td>
&lt;td>Distribution&lt;/td>
&lt;td>Kolmogorov-Smirnov&lt;/td>
&lt;td>&amp;ldquo;&lt;variable> does not depart from a uniform distribution.&amp;rdquo;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Numeric&lt;/td>
&lt;td>Numeric&lt;/td>
&lt;td>6&lt;/td>
&lt;td>Distribution&lt;/td>
&lt;td>Kolmogorov-Smirnov&lt;/td>
&lt;td>&amp;ldquo;The distribution of &lt;variable> varies across quantile of &lt;covariate>.&amp;rdquo;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Numeric&lt;/td>
&lt;td>Categorical&lt;/td>
&lt;td>2+&lt;/td>
&lt;td>Distribution&lt;/td>
&lt;td>Kolmogorov-Smirnov&lt;/td>
&lt;td>&amp;ldquo;The distribution of &lt;variable> does not vary across values of &lt;covariate>.&amp;rdquo;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Numeric&lt;/td>
&lt;td>Numeric&lt;/td>
&lt;td>2&lt;/td>
&lt;td>Correlation&lt;/td>
&lt;td>Pearson Correlation&lt;/td>
&lt;td>&amp;ldquo;A positive correlation exists between &lt;variable> and &lt;covariate>.&amp;rdquo;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Numeric&lt;/td>
&lt;td>Categorical&lt;/td>
&lt;td>2+&lt;/td>
&lt;td>Rank Correlation&lt;/td>
&lt;td>Spearman Correlation&lt;/td>
&lt;td>&amp;ldquo;A negative correlation exists between &lt;variable> and &lt;covariate>.&amp;rdquo;&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>In the case where variable is a categorical and covariate is numeric, the test is a Chi-squared test of independence.
The numeric variable is binned into quantiles, and the Chi-squared test is performed on the contingency table of the
binned variable and the categorical covariate. When the categorial has only 2 levels and the covariate is only 2 levels, then we have a special case of the Chi-squared test of independence, which is the Fisher&amp;rsquo;s exact test and we report the z-score and p-value of the Fisher&amp;rsquo;s exact test. Otherwise, we report the Chi-squared test statistic and p-value.&lt;/p>
&lt;p>The Kolmogorov-Smirnov test can be used in three ways:&lt;/p>
&lt;ol>
&lt;li>Single variable: Tests whether a numeric variable follows a uniform distribution (normalized to [0,1] range).&lt;/li>
&lt;li>With numeric covariate: Uses two-sample KS tests to compare distributions across 6 quantiles of the covariate.&lt;/li>
&lt;li>With categorical covariate: Uses two-sample KS tests to compare distributions across category groups.&lt;/li>
&lt;/ol>
&lt;p>The Pearson Correlation test examines the linear relationship between two numeric variables. It returns the correlation
coefficient as the test statistic along with degrees of freedom (n-2). The test determines if a significant positive
or negative correlation exists between the variables.&lt;/p>
&lt;p>The Spearman Correlation test examines the monotonic relationship between a numeric variable and categorical covariate.
The categorical variable is converted to numeric ranks (sorted alphabetically) and then Spearman&amp;rsquo;s rank correlation
is computed. This test is useful for detecting ordinal relationships between numeric and categorical variables.&lt;/p>
&lt;p>We can turn these rules into a function that automatically selects and runs the appropriate test based on the variable types and covariate characteristics. The function will return a structured result with the test statistic, critical value, p-value, and a plain-language conclusion.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> numpy &lt;span style="color:#f00">as&lt;/span> np &lt;span style="color:#0f0"># type: ignore&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> polars &lt;span style="color:#f00">as&lt;/span> pl
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> scipy &lt;span style="color:#f00">import&lt;/span> stats &lt;span style="color:#0f0"># type: ignore&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> pydantic &lt;span style="color:#f00">import&lt;/span> BaseModel
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> typing &lt;span style="color:#f00">import&lt;/span> List, Union, Optional
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> MeanDiffBootstrapResult(BaseModel):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> mean: float
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ci_lb: float
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ci_ub: float
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> EffectSizeResult(BaseModel):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> standardized_difference: float
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ci_lower: float
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ci_upper: float
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> EffectSizeCategoricalResult(BaseModel):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> standardized_difference: Optional[float]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ci_lower: Optional[float]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ci_upper: Optional[float]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> StandardizedDifferenceVariable(BaseModel):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> variable: str
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> type: str
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> standardized_difference: Optional[float]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ci_lower: Optional[float]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ci_upper: Optional[float]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> StandardizedDifferenceResult(BaseModel):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> variables: List[StandardizedDifferenceVariable]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> StatisticalTestResult(BaseModel):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_name: str
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_statistic: float
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value: float
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant: bool
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> p_value: float
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion: str
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> degrees_of_freedom: Optional[int] = &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> contingency_table: Optional[dict] = &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">bin_numeric_column&lt;/span>(df: pl.DataFrame, column: str, n_bins: int = &lt;span style="color:#f60">6&lt;/span>, method: str = &lt;span style="color:#87ceeb">&amp;#34;quantile&amp;#34;&lt;/span>) -&amp;gt; pl.DataFrame:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Bin a numeric column in a dataframe using either quantile or equidistant binning.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> For quantile binning, extreme outliers (below 1st percentile and above 99th percentile)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> are removed when calculating quantile thresholds, but all data points are included in
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> the final bins. Outliers are placed in the appropriate edge bins.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> df (pl.DataFrame): Input dataframe
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> column (str): Name of the numeric column to bin
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> n_bins (int): Number of bins to create (default: 4)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> method (str): Binning method - &amp;#34;quantile&amp;#34; or &amp;#34;equidistant&amp;#34; (default: &amp;#34;quantile&amp;#34;)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> pl.DataFrame: Dataframe with an additional column &amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{column}&lt;/span>&lt;span style="color:#87ceeb">_bins&amp;#34; containing bin labels
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> method == &lt;span style="color:#87ceeb">&amp;#34;quantile&amp;#34;&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Use quantile-based binning with outlier winsorization&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Get non-null values for outlier detection&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> non_null_values = df.filter(pl.col(column).is_not_null())[column].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(non_null_values) == &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Handle all-null column&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> df.with_columns(pl.lit(&lt;span style="color:#f00">None&lt;/span>, dtype=pl.Utf8).alias(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>column&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">_bins&amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(set(non_null_values)) == &lt;span style="color:#f60">1&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Handle constant column case&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> df.with_columns(pl.lit(&lt;span style="color:#87ceeb">&amp;#34;Q1&amp;#34;&lt;/span>).alias(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>column&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">_bins&amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Remove extreme outliers for quantile calculation (winsorization)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> p1 = np.percentile(non_null_values, &lt;span style="color:#f60">1&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> p99 = np.percentile(non_null_values, &lt;span style="color:#f60">99&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Filter out extreme outliers for quantile calculation only&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> winsorized_values = non_null_values[(non_null_values &amp;gt;= p1) &amp;amp; (non_null_values &amp;lt;= p99)]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(winsorized_values) &amp;lt; n_bins:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># If too few values after winsorization, use all values&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> winsorized_values = non_null_values
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Calculate quantile probs based on winsorized data, then apply to all data&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> quantile_probs = [i / n_bins &lt;span style="color:#f00">for&lt;/span> i in range(n_bins + &lt;span style="color:#f60">1&lt;/span>)]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> quantile_values = [np.percentile(winsorized_values, q * &lt;span style="color:#f60">100&lt;/span>) &lt;span style="color:#f00">for&lt;/span> q in quantile_probs]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Adjust edges to include all original data (including outliers)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> quantile_values[&lt;span style="color:#f60">0&lt;/span>] = non_null_values.min() - &lt;span style="color:#f60">1e-10&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> quantile_values[-&lt;span style="color:#f60">1&lt;/span>] = non_null_values.max() + &lt;span style="color:#f60">1e-10&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Use cut with manual binning&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">assign_bin&lt;/span>(value):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> value is &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#f00">None&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i in range(len(quantile_values) - &lt;span style="color:#f60">1&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> quantile_values[i] &amp;lt;= value &amp;lt; quantile_values[i + &lt;span style="color:#f60">1&lt;/span>]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Q&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>i + &lt;span style="color:#f60">1&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Handle edge case for maximum value&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Q&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>n_bins&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Apply binning using map_elements&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> df.with_columns(pl.col(column).map_elements(assign_bin, return_dtype=pl.Utf8).alias(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>column&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">_bins&amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">elif&lt;/span> method == &lt;span style="color:#87ceeb">&amp;#34;equidistant&amp;#34;&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># For n_bins output bins, create n_bins - 1 internal break points (excluding the min/max endpoints)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># This allows Polars to automatically create exactly n_bins intervals: (-inf, break1], (break1, break2], ..., (breakN-1, inf]&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Provide n_bins labels to match these intervals&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Use equidistant binning&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> col_min = df[column].min()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> col_max = df[column].max()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Handle null cases&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> col_min is &lt;span style="color:#f00">None&lt;/span> or col_max is &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> df.with_columns(pl.lit(&lt;span style="color:#f00">None&lt;/span>, dtype=pl.Utf8).alias(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>column&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">_bins&amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> col_min == col_max:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Handle constant column case&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> df.with_columns(pl.lit(&lt;span style="color:#87ceeb">&amp;#34;Bin1&amp;#34;&lt;/span>).alias(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>column&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">_bins&amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Create equidistant bins&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># For n_bins, we need n_bins-1 internal break points (excluding min/max)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Polars creates bins: (-inf, break1], (break1, break2], ..., (breakN-1, inf]&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> n_bins == &lt;span style="color:#f60">1&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Special case: only one bin&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> df.with_columns(pl.lit(&lt;span style="color:#87ceeb">&amp;#34;Bin1&amp;#34;&lt;/span>).alias(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>column&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">_bins&amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> bin_edges = np.linspace(float(col_min), float(col_max), n_bins + &lt;span style="color:#f60">1&lt;/span>)[&lt;span style="color:#f60">1&lt;/span>:-&lt;span style="color:#f60">1&lt;/span>] &lt;span style="color:#0f0"># exclude endpoints&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> bin_labels = [&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Bin&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>i + &lt;span style="color:#f60">1&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span> &lt;span style="color:#f00">for&lt;/span> i in range(n_bins)]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Use cut for equidistant binning&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> df.with_columns(pl.col(column).cut(bin_edges.tolist(), labels=bin_labels).alias(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>column&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">_bins&amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ValueError(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Unknown binning method: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>method&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">. Use &amp;#39;quantile&amp;#39; or &amp;#39;equidistant&amp;#39;&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">mean_diff&lt;/span>(group_a, group_b) -&amp;gt; float:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> np.mean(group_a) - np.mean(group_b)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">drop_outliers_array&lt;/span>(arr, quantiles):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> lower = np.percentile(arr, quantiles[&lt;span style="color:#f60">0&lt;/span>] * &lt;span style="color:#f60">100&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> upper = np.percentile(arr, quantiles[&lt;span style="color:#f60">1&lt;/span>] * &lt;span style="color:#f60">100&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Drop outliers outside the quantiles&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> mask = (arr &amp;gt;= lower) &amp;amp; (arr &amp;lt;= upper)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> arr[mask]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">bootstrap_mean_diff&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df: Union[pl.DataFrame, pl.LazyFrame],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> metric_col: str,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> intervention_col: str,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> n_resamples: int = &lt;span style="color:#f60">1000&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> drop_outliers: bool = &lt;span style="color:#f00">False&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> outlier_quantiles: List[float] = [&lt;span style="color:#f60">0.01&lt;/span>, &lt;span style="color:#f60">0.99&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; MeanDiffBootstrapResult:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Calculate the mean difference between two groups using bootstrapping.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> df (Union[pl.DataFrame, pl.LazyFrame]): The input DataFrame containing the data.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> metric_col (str): The name of the column containing the metric to analyze.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> intervention_col (str): The name of the column indicating the intervention group (1 for treatment, 0 for control).
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> n_resamples (int): The number of bootstrap resamples to perform. Default is 1000.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> drop_outliers (bool): Whether to drop outliers outside the specified quantiles. Default is False.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> outlier_quantiles (List[float]): The quantiles to use for outlier removal. Default is [0.01, 0.99].
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Values outside these quantiles will be dropped from the analysis.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Separate the data into two groups based on the &amp;#39;intervention&amp;#39; column&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Select only needed columns and collect for analysis&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(df, pl.LazyFrame):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dx = df.select([metric_col, intervention_col]).drop_nulls().collect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dx = df.lazy().select([metric_col, intervention_col]).drop_nulls().collect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Group_a is the treatment group (intervention_col == 1)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Group_b is the control group (intervention_col == 0)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> drop_outliers:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Apply outlier removal to each group after splitting&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_a = dx.filter(pl.col(intervention_col) == &lt;span style="color:#f60">1&lt;/span>)[metric_col].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_b = dx.filter(pl.col(intervention_col) == &lt;span style="color:#f60">0&lt;/span>)[metric_col].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_a = drop_outliers_array(group_a, outlier_quantiles)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_b = drop_outliers_array(group_b, outlier_quantiles)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_a = dx.filter(pl.col(intervention_col) == &lt;span style="color:#f60">1&lt;/span>)[metric_col].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_b = dx.filter(pl.col(intervention_col) == &lt;span style="color:#f60">0&lt;/span>)[metric_col].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Bootstrap resampling&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> res = stats.bootstrap(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> (group_a, group_b),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> statistic=mean_diff,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> n_resamples=n_resamples,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> method=&lt;span style="color:#87ceeb">&amp;#34;percentile&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ci = res.confidence_interval
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> mu = float(np.mean(res.bootstrap_distribution))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> MeanDiffBootstrapResult(mean=mu, ci_lb=ci.low, ci_ub=ci.high)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">effect_size_continuous&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df: Union[pl.DataFrame, pl.LazyFrame], group_col: str, var_col: str, coverage: float = &lt;span style="color:#f60">0.95&lt;/span>, decimals: int = &lt;span style="color:#f60">3&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; EffectSizeResult:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Effect size for continuous variables.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> df (pl.DataFrame): Dataframe
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> group_col (str): Group column name with treatment and control assignments
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> var_col (str): Variable column name
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> coverage (float): Coverage of the confidence interval
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> decimals (int): Number of decimals to round the effect size
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> EffectSizeResult: Effect size and confidence interval
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Examples:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ```python
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> df = pl.DataFrame(
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> {
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;group&amp;#34;: [&amp;#34;A&amp;#34;, &amp;#34;B&amp;#34;, &amp;#34;A&amp;#34;, &amp;#34;B&amp;#34;, &amp;#34;A&amp;#34;, &amp;#34;B&amp;#34;, &amp;#34;A&amp;#34;, &amp;#34;B&amp;#34;],
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;var&amp;#34;: [0.1, 0.2, 0.13, 0.4, 0.25, 0.6, 0.17, 0.8],
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> }
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> )
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> effect_size_continuous(df, &amp;#34;group&amp;#34;, &amp;#34;var&amp;#34;) # EffectSizeResult(standardized_difference=1.793, ci_lower=0.152, ci_upper=3.434)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ```
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> math
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> scipy.stats &lt;span style="color:#f00">as&lt;/span> stats
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Select only needed columns and collect for analysis&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(df, pl.LazyFrame):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_subset = df.select([group_col, var_col]).drop_nulls().collect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_subset = df.select([group_col, var_col]).drop_nulls()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">assert&lt;/span> df_subset[group_col].n_unique() == &lt;span style="color:#f60">2&lt;/span>, &lt;span style="color:#87ceeb">&amp;#34;Only 2 groups allowed&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Calculate means and variances by group&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_stats = (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_subset.group_by(group_col)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> .agg(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> [
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pl.col(var_col).mean().alias(&lt;span style="color:#87ceeb">&amp;#34;mean&amp;#34;&lt;/span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pl.col(var_col).var(ddof=&lt;span style="color:#f60">1&lt;/span>).alias(&lt;span style="color:#87ceeb">&amp;#34;var&amp;#34;&lt;/span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pl.col(var_col).count().alias(&lt;span style="color:#87ceeb">&amp;#34;count&amp;#34;&lt;/span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> .sort(group_col)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> m = group_stats[&lt;span style="color:#87ceeb">&amp;#34;mean&amp;#34;&lt;/span>].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> v = group_stats[&lt;span style="color:#87ceeb">&amp;#34;var&amp;#34;&lt;/span>].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> n = group_stats[&lt;span style="color:#87ceeb">&amp;#34;count&amp;#34;&lt;/span>].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stdiff = (m[&lt;span style="color:#f60">1&lt;/span>] - m[&lt;span style="color:#f60">0&lt;/span>]) / math.sqrt(v.sum() / &lt;span style="color:#f60">2&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stdiff = round(stdiff, decimals)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># compute confidence interval&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Number of observations in group 1, group 0, and total&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> n0, n1 = n[&lt;span style="color:#f60">0&lt;/span>], n[&lt;span style="color:#f60">1&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total = n0 + n1
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Computing the corresponding value from the standard Normal for specified CI coverage&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> percentile = &lt;span style="color:#f60">1&lt;/span> - ((&lt;span style="color:#f60">1&lt;/span> - coverage) / &lt;span style="color:#f60">2&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> zscore = stats.norm.ppf(percentile)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Computing the standard deviation&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> deviation = np.sqrt((total / (n0 * n1)) + ((stdiff**&lt;span style="color:#f60">2&lt;/span>) / (&lt;span style="color:#f60">2&lt;/span> * total)))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Constructing the CIs using the Z-score and standard deviation&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> lower_ci = stdiff - zscore * deviation
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> upper_ci = stdiff + zscore * deviation
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> lower_ci = round(lower_ci, decimals)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> upper_ci = round(upper_ci, decimals)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> EffectSizeResult(standardized_difference=stdiff, ci_lower=lower_ci, ci_upper=upper_ci)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">effect_size_categorical&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df: Union[pl.DataFrame, pl.LazyFrame], group_col: str, var_col: str, coverage: float = &lt;span style="color:#f60">0.95&lt;/span>, decimals: int = &lt;span style="color:#f60">3&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; EffectSizeCategoricalResult:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Effect size for binary categorical variables.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> df (pl.DataFrame): Dataframe
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> group_col (str): Group column name with treatment and control assignments
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> var_col (str): Variable column name, binary (2 levels) or categorical (&amp;gt;2 levels)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> coverage (float): Coverage of the confidence interval
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> decimals (int): Number of decimals to round the effect size
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> EffectSizeCategoricalResult: Effect size for each level
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Examples:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ```python
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> df = pl.DataFrame({
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;group&amp;#34;: [&amp;#34;T&amp;#34;, &amp;#34;C&amp;#34;, &amp;#34;T&amp;#34;, &amp;#34;C&amp;#34;, &amp;#34;T&amp;#34;, &amp;#34;C&amp;#34;],
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;var&amp;#34;: [&amp;#34;A&amp;#34;, &amp;#34;B&amp;#34;, &amp;#34;A&amp;#34;, &amp;#34;A&amp;#34;, &amp;#34;B&amp;#34;, &amp;#34;C&amp;#34;],
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> })
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> effect_size_categorical(df, &amp;#34;group&amp;#34;, &amp;#34;var&amp;#34;) # EffectSizeCategoricalResult(standardized_difference=1.069, ci_lower=-0.642, ci_upper=2.78)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ```
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">import&lt;/span> scipy.stats &lt;span style="color:#f00">as&lt;/span> stats
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Select only needed columns and collect for analysis&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(df, pl.LazyFrame):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df = df.select([group_col, var_col]).drop_nulls().collect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df = df.select([group_col, var_col]).drop_nulls()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Calculate value counts for each group-variable combination&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> value_counts = df.group_by([group_col, var_col]).agg(pl.len().alias(&lt;span style="color:#87ceeb">&amp;#34;count&amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Calculate total counts per group to compute proportions&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_totals = df.group_by(group_col).agg(pl.len().alias(&lt;span style="color:#87ceeb">&amp;#34;total&amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Join to get proportions&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> probs_df = value_counts.join(group_totals, on=group_col).with_columns(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> (pl.col(&lt;span style="color:#87ceeb">&amp;#34;count&amp;#34;&lt;/span>) / pl.col(&lt;span style="color:#87ceeb">&amp;#34;total&amp;#34;&lt;/span>)).alias(&lt;span style="color:#87ceeb">&amp;#34;prob&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Pivot to get groups as rows and variable levels as columns&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Get unique values for both dimensions&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> groups = df[group_col].unique().sort().to_list()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> levels = df[var_col].unique().sort().to_list()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Create a matrix of probabilities&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> prob_matrix = np.zeros((len(groups), len(levels)))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i, group in enumerate(groups):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> j, level in enumerate(levels):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> prob_row = probs_df.filter((pl.col(group_col) == group) &amp;amp; (pl.col(var_col) == level))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(prob_row) &amp;gt; &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> prob_matrix[i, j] = prob_row[&lt;span style="color:#87ceeb">&amp;#34;prob&amp;#34;&lt;/span>][&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group0 = prob_matrix[&lt;span style="color:#f60">0&lt;/span>, :]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group1 = prob_matrix[&lt;span style="color:#f60">1&lt;/span>, :]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># prob_matrix.shape = (levels, 2)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> prob_matrix_t = prob_matrix.transpose()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Computing the probability difference between group 1 and group 0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Dropping the 1st difference as there are n-1 degrees of freedom&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> prob_difference = np.subtract(group1, group0)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> prob_difference = np.delete(prob_difference, (&lt;span style="color:#f60">0&lt;/span>), axis=&lt;span style="color:#f60">0&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Check if we have any probability differences left after dropping the first&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(prob_difference) == &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Handle case with only 2 levels (binary categorical)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Use simple difference between the probabilities for the second level&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> prob_diff_scalar = group1[&lt;span style="color:#f60">1&lt;/span>] - group0[&lt;span style="color:#f60">1&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># For binary case, use simple variance formula&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> var_group0 = group0[&lt;span style="color:#f60">1&lt;/span>] * (&lt;span style="color:#f60">1&lt;/span> - group0[&lt;span style="color:#f60">1&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> var_group1 = group1[&lt;span style="color:#f60">1&lt;/span>] * (&lt;span style="color:#f60">1&lt;/span> - group1[&lt;span style="color:#f60">1&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pooled_var = (var_group0 + var_group1) / &lt;span style="color:#f60">2&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> pooled_var == &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Perfect separation case&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> EffectSizeCategoricalResult(standardized_difference=&lt;span style="color:#f00">None&lt;/span>, ci_lower=&lt;span style="color:#f00">None&lt;/span>, ci_upper=&lt;span style="color:#f00">None&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stdiff = abs(prob_diff_scalar) / np.sqrt(pooled_var)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stdiff = round(stdiff, decimals)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># compute confidence interval for binary case&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_sizes = df.group_by(group_col).agg(pl.len().alias(&lt;span style="color:#87ceeb">&amp;#34;size&amp;#34;&lt;/span>)).sort(group_col)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> n0, n1 = group_sizes[&lt;span style="color:#87ceeb">&amp;#34;size&amp;#34;&lt;/span>].to_list()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total = n0 + n1
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> percentile = &lt;span style="color:#f60">1&lt;/span> - ((&lt;span style="color:#f60">1&lt;/span> - coverage) / &lt;span style="color:#f60">2&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> zscore = stats.norm.ppf(percentile)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> deviation = np.sqrt((total / (n0 * n1)) + ((stdiff**&lt;span style="color:#f60">2&lt;/span>) / (&lt;span style="color:#f60">2&lt;/span> * total)))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> lower_ci = stdiff - zscore * deviation
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> upper_ci = stdiff + zscore * deviation
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> lower_ci = round(lower_ci, decimals)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> upper_ci = round(upper_ci, decimals)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> EffectSizeCategoricalResult(standardized_difference=stdiff, ci_lower=lower_ci, ci_upper=upper_ci)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Computing the covariance matrix&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> levels_count = prob_matrix_t.shape[&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> covariance = np.zeros(shape=(levels_count, levels_count))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> row in range(levels_count):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> col in range(levels_count):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> row == col:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> covariance[row][col] = (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> prob_matrix_t[row][&lt;span style="color:#f60">0&lt;/span>] * (&lt;span style="color:#f60">1&lt;/span> - prob_matrix_t[row][&lt;span style="color:#f60">0&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> + prob_matrix_t[row][&lt;span style="color:#f60">1&lt;/span>] * (&lt;span style="color:#f60">1&lt;/span> - prob_matrix_t[row][&lt;span style="color:#f60">1&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ) / &lt;span style="color:#f60">2&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> covariance[row][col] = (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> -(prob_matrix_t[row][&lt;span style="color:#f60">0&lt;/span>] * prob_matrix_t[col][&lt;span style="color:#f60">0&lt;/span>] + prob_matrix_t[row][&lt;span style="color:#f60">1&lt;/span>] * prob_matrix_t[col][&lt;span style="color:#f60">1&lt;/span>]) / &lt;span style="color:#f60">2&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Dropping the 1st line and row as there are n-1 degrees of freedom&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Computing the inverse of the covariance matrix&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> covariance = np.delete(covariance, (&lt;span style="color:#f60">0&lt;/span>), axis=&lt;span style="color:#f60">0&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> covariance = np.delete(covariance, (&lt;span style="color:#f60">0&lt;/span>), axis=&lt;span style="color:#f60">1&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> inverse = np.linalg.inv(covariance)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> np.linalg.LinAlgError:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> EffectSizeCategoricalResult(standardized_difference=&lt;span style="color:#f00">None&lt;/span>, ci_lower=&lt;span style="color:#f00">None&lt;/span>, ci_upper=&lt;span style="color:#f00">None&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Computing the standardized difference (using Mahalanobis distance)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stdiff = np.sqrt(np.linalg.multi_dot([prob_difference.T, inverse, prob_difference]))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stdiff = round(stdiff, decimals)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># compute confidence interval&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Number of observations in group 1, group 0, and total&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_sizes = df.group_by(group_col).agg(pl.len().alias(&lt;span style="color:#87ceeb">&amp;#34;size&amp;#34;&lt;/span>)).sort(group_col)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> n0, n1 = group_sizes[&lt;span style="color:#87ceeb">&amp;#34;size&amp;#34;&lt;/span>].to_list()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total = n0 + n1
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Computing the corresponding value from the standard Normal for specified CI coverage&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> percentile = &lt;span style="color:#f60">1&lt;/span> - ((&lt;span style="color:#f60">1&lt;/span> - coverage) / &lt;span style="color:#f60">2&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> zscore = stats.norm.ppf(percentile)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Computing the standard deviation&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> deviation = np.sqrt((total / (n0 * n1)) + ((stdiff**&lt;span style="color:#f60">2&lt;/span>) / (&lt;span style="color:#f60">2&lt;/span> * total)))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Constructing the CIs using the Z-score and standard deviation&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> lower_ci = stdiff - zscore * deviation
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> upper_ci = stdiff + zscore * deviation
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> lower_ci = round(lower_ci, decimals)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> upper_ci = round(upper_ci, decimals)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> EffectSizeCategoricalResult(standardized_difference=stdiff, ci_lower=lower_ci, ci_upper=upper_ci)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">standardized_difference&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df: Union[pl.DataFrame, pl.LazyFrame],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> intervention: str,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> categorical: Optional[Union[str, List[str]]] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> continuous: Optional[Union[str, List[str]]] = &lt;span style="color:#f00">None&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> coverage: float = &lt;span style="color:#f60">0.95&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> decimals: int = &lt;span style="color:#f60">3&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; pl.DataFrame:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Effect size for binary categorical variables and continuous variables.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> df (pl.DataFrame): Dataframe
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> intervention (str): Intervention column name with treatment and control assignments
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> categorical (str or list): Categorical variable column name(s)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> continuous (str or list): Continuous variable column name(s)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> coverage (float): Coverage of the confidence interval, default 0.95
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> decimals (int): Number of decimals to round the effect size, default 3
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> (pl.DataFrame): Dataframe with effect size for each level
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Examples:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ```python
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> df = pl.DataFrame({
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;intervention&amp;#34;: [&amp;#34;A&amp;#34;, &amp;#34;B&amp;#34;, &amp;#34;A&amp;#34;, &amp;#34;B&amp;#34;, &amp;#34;A&amp;#34;, &amp;#34;B&amp;#34;],
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;categorical_var&amp;#34;: [&amp;#34;X&amp;#34;, &amp;#34;Y&amp;#34;, &amp;#34;Y&amp;#34;, &amp;#34;Y&amp;#34;, &amp;#34;X&amp;#34;, &amp;#34;X&amp;#34;], # Now both groups have both X and Y
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;continuous_var&amp;#34;: [1.2, 2.3, 1.5, 2.8, 1.7, 2.9],
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> })
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> standardized_difference(df, &amp;#34;intervention&amp;#34;, categorical=&amp;#34;categorical_var&amp;#34;, continuous=&amp;#34;continuous_var&amp;#34;)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> # Returns a DataFrame with effect sizes for categorical and continuous variables
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> # variable,type,SD,2.5%,97.5%
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> # &amp;#34;categorical_var&amp;#34;,&amp;#34;categorical&amp;#34;,0.707,-0.943,2.357
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> # &amp;#34;continuous_var&amp;#34;,&amp;#34;continuous&amp;#34;,4.157,1.312,7.002
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ```
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> upper = round(&lt;span style="color:#f60">1&lt;/span> - ((&lt;span style="color:#f60">1&lt;/span> - coverage) / &lt;span style="color:#f60">2&lt;/span>), &lt;span style="color:#f60">3&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> lower = round(&lt;span style="color:#f60">1&lt;/span> - upper, &lt;span style="color:#f60">3&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> upper, lower = upper * &lt;span style="color:#f60">100&lt;/span>, lower * &lt;span style="color:#f60">100&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Determine which columns we need and select them efficiently&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> needed_cols = [intervention]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> categorical is not &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cat_cols = categorical &lt;span style="color:#f00">if&lt;/span> isinstance(categorical, list) &lt;span style="color:#f00">else&lt;/span> [categorical]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> needed_cols.extend(cat_cols)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> continuous is not &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cont_cols = continuous &lt;span style="color:#f00">if&lt;/span> isinstance(continuous, list) &lt;span style="color:#f00">else&lt;/span> [continuous]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> needed_cols.extend(cont_cols)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Select only needed columns&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(df, pl.LazyFrame):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_subset = df.select(needed_cols).collect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_subset = df.select(needed_cols)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> results = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> categorical is not &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> categorical = categorical &lt;span style="color:#f00">if&lt;/span> isinstance(categorical, list) &lt;span style="color:#f00">else&lt;/span> [categorical]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> col in categorical:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> es_result = effect_size_categorical(df_subset, intervention, col, coverage=coverage, decimals=decimals)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> results.append(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;variable&amp;#34;&lt;/span>: col,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;type&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;categorical&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;SD&amp;#34;&lt;/span>: es_result.standardized_difference,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>lower&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">%&amp;#34;&lt;/span>: es_result.ci_lower,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>upper&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">%&amp;#34;&lt;/span>: es_result.ci_upper,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> continuous is not &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> continuous = continuous &lt;span style="color:#f00">if&lt;/span> isinstance(continuous, list) &lt;span style="color:#f00">else&lt;/span> [continuous]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> col in continuous:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> es_result = effect_size_continuous(df_subset, intervention, col, coverage=coverage, decimals=decimals)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> results.append(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;variable&amp;#34;&lt;/span>: col,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;type&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;continuous&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;SD&amp;#34;&lt;/span>: es_result.standardized_difference,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>lower&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">%&amp;#34;&lt;/span>: es_result.ci_lower,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>upper&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">%&amp;#34;&lt;/span>: es_result.ci_upper,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not results:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> pl.DataFrame()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> pl.DataFrame(results)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">kruskal_wallis_test&lt;/span>(df: Union[pl.DataFrame, pl.LazyFrame], variable: str, covariate: str) -&amp;gt; StatisticalTestResult:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Perform Kruskal-Wallis test for numeric variable across categorical covariate (&amp;gt;2 categories).
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Tests whether median variable does not vary across values of covariate.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Select only needed columns and collect for analysis&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(df, pl.LazyFrame):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls().collect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> groups = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> categories = df_clean[covariate].unique().sort().to_list()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> category in categories:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_data = df_clean.filter(pl.col(covariate) == category)[variable].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> groups.append(group_data)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> statistic, p_value = stats.kruskal(*groups)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Critical value at 5% significance level (chi-squared distribution with k-1 df)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_val = len(categories) - &lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value = float(stats.chi2.ppf(&lt;span style="color:#f60">0.95&lt;/span>, df_val))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant = bool(statistic &amp;gt; critical_value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Median &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>variable&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>&lt;span style="color:#87ceeb">&amp;#39;varies&amp;#39;&lt;/span> &lt;span style="color:#f00">if&lt;/span> significant &lt;span style="color:#f00">else&lt;/span> &lt;span style="color:#87ceeb">&amp;#39;does not vary&amp;#39;&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> across values of &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> StatisticalTestResult(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_name=&lt;span style="color:#87ceeb">&amp;#34;Kruskal-Wallis&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_statistic=float(statistic),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value=critical_value,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant=significant,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> p_value=float(p_value),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion=conclusion,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">mann_whitney_test&lt;/span>(df: Union[pl.DataFrame, pl.LazyFrame], variable: str, covariate: str) -&amp;gt; StatisticalTestResult:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Perform Mann-Whitney U test for numeric variable across categorical covariate (2 categories).
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Tests whether median variable varies across values of covariate.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Select only needed columns and collect for analysis&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(df, pl.LazyFrame):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls().collect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> categories = df_clean[covariate].unique().sort().to_list()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(categories) != &lt;span style="color:#f60">2&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ValueError(&lt;span style="color:#87ceeb">&amp;#34;Mann-Whitney test requires exactly 2 categories&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group1 = df_clean.filter(pl.col(covariate) == categories[&lt;span style="color:#f60">0&lt;/span>])[variable].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group2 = df_clean.filter(pl.col(covariate) == categories[&lt;span style="color:#f60">1&lt;/span>])[variable].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> statistic, p_value = stats.mannwhitneyu(group1, group2, alternative=&lt;span style="color:#87ceeb">&amp;#34;two-sided&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Convert to z-score for comparison&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> n1, n2 = len(group1), len(group2)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> mean_u = n1 * n2 / &lt;span style="color:#f60">2&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> std_u = np.sqrt(n1 * n2 * (n1 + n2 + &lt;span style="color:#f60">1&lt;/span>) / &lt;span style="color:#f60">12&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> z_score = abs((statistic - mean_u) / std_u) &lt;span style="color:#f00">if&lt;/span> std_u &amp;gt; &lt;span style="color:#f60">0&lt;/span> &lt;span style="color:#f00">else&lt;/span> &lt;span style="color:#f60">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value = float(stats.norm.ppf(&lt;span style="color:#f60">0.975&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant = bool(z_score &amp;gt; critical_value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Median &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>variable&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>&lt;span style="color:#87ceeb">&amp;#39;varies&amp;#39;&lt;/span> &lt;span style="color:#f00">if&lt;/span> significant &lt;span style="color:#f00">else&lt;/span> &lt;span style="color:#87ceeb">&amp;#39;does not vary&amp;#39;&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> across values of &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> StatisticalTestResult(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_name=&lt;span style="color:#87ceeb">&amp;#34;Mann-Whitney&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_statistic=float(z_score),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value=critical_value,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant=significant,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> p_value=float(p_value),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion=conclusion,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">anova_test&lt;/span>(df: Union[pl.DataFrame, pl.LazyFrame], variable: str, covariate: str) -&amp;gt; StatisticalTestResult:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Perform one-way ANOVA test for numeric variable across categorical covariate (&amp;gt;2 categories).
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Tests whether mean variable does not vary across values of covariate.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Select only needed columns and collect for analysis&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(df, pl.LazyFrame):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls().collect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> groups = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> categories = df_clean[covariate].unique().sort().to_list()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> category in categories:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_data = df_clean.filter(pl.col(covariate) == category)[variable].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> groups.append(group_data)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Check if any group has only one observation&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_sizes = [len(group) &lt;span style="color:#f00">for&lt;/span> group in groups]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> all(size &amp;lt;= &lt;span style="color:#f60">1&lt;/span> &lt;span style="color:#f00">for&lt;/span> size in group_sizes):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ValueError(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;ANOVA requires at least one group with more than 1 observation. All groups for &amp;#39;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#39; have only 1 observation each.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> any(size &amp;lt;= &lt;span style="color:#f60">1&lt;/span> &lt;span style="color:#f00">for&lt;/span> size in group_sizes):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> single_obs_groups = [categories[i] &lt;span style="color:#f00">for&lt;/span> i, size in enumerate(group_sizes) &lt;span style="color:#f00">if&lt;/span> size &amp;lt;= &lt;span style="color:#f60">1&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ValueError(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;ANOVA requires all groups to have more than 1 observation. Groups with single observations: &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>single_obs_groups&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> statistic, p_value = stats.f_oneway(*groups)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Critical value at 5% significance level (F-distribution)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_between = len(categories) - &lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_within = len(df_clean) - len(categories)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value = float(stats.f.ppf(&lt;span style="color:#f60">0.95&lt;/span>, df_between, df_within))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant = bool(statistic &amp;gt; critical_value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Mean &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>variable&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>&lt;span style="color:#87ceeb">&amp;#39;varies&amp;#39;&lt;/span> &lt;span style="color:#f00">if&lt;/span> significant &lt;span style="color:#f00">else&lt;/span> &lt;span style="color:#87ceeb">&amp;#39;does not vary&amp;#39;&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> across values of &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> StatisticalTestResult(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_name=&lt;span style="color:#87ceeb">&amp;#34;ANOVA&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_statistic=float(statistic),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value=critical_value,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant=significant,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> p_value=float(p_value),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion=conclusion,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">t_test&lt;/span>(df: Union[pl.DataFrame, pl.LazyFrame], variable: str, covariate: str) -&amp;gt; StatisticalTestResult:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Perform independent t-test for numeric variable across categorical covariate (2 categories).
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Tests whether mean variable varies across values of covariate.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Select only needed columns and collect for analysis&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(df, pl.LazyFrame):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls().collect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> categories = df_clean[covariate].unique().sort().to_list()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(categories) != &lt;span style="color:#f60">2&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ValueError(&lt;span style="color:#87ceeb">&amp;#34;T-test requires exactly 2 categories&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group1 = df_clean.filter(pl.col(covariate) == categories[&lt;span style="color:#f60">0&lt;/span>])[variable].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group2 = df_clean.filter(pl.col(covariate) == categories[&lt;span style="color:#f60">1&lt;/span>])[variable].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> statistic, p_value = stats.ttest_ind(group1, group2)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Critical value at 5% significance level (t-distribution)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_val = len(group1) + len(group2) - &lt;span style="color:#f60">2&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value = float(stats.t.ppf(&lt;span style="color:#f60">0.975&lt;/span>, df_val))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant = bool(abs(statistic) &amp;gt; critical_value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Mean &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>variable&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>&lt;span style="color:#87ceeb">&amp;#39;varies&amp;#39;&lt;/span> &lt;span style="color:#f00">if&lt;/span> significant &lt;span style="color:#f00">else&lt;/span> &lt;span style="color:#87ceeb">&amp;#39;does not vary&amp;#39;&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> across values of &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> StatisticalTestResult(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_name=&lt;span style="color:#87ceeb">&amp;#34;T-test&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_statistic=float(abs(statistic)),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value=critical_value,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant=significant,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> p_value=float(p_value),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion=conclusion,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">chi_squared_test&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df: Union[pl.DataFrame, pl.LazyFrame], variable: str, covariate: str, n_quantiles: int = &lt;span style="color:#f60">4&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; StatisticalTestResult:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Perform Chi-squared test of independence between two variables.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> If the covariate is numeric, it is binned into quantiles for the test.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> If both variables are categorical, tests their independence directly.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Select only needed columns and collect for analysis&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(df, pl.LazyFrame):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls().collect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Check if covariate is numeric or categorical&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> covariate_is_numeric = df_clean[covariate].dtype in [pl.Float64, pl.Float32, pl.Int64, pl.Int32]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> covariate_is_numeric:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Bin the numeric covariate into quantiles&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_with_bins = bin_numeric_column(df_clean, covariate, n_quantiles, &lt;span style="color:#87ceeb">&amp;#34;quantile&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_with_quantiles = df_with_bins.with_columns(pl.col(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">_bins&amp;#34;&lt;/span>).alias(&lt;span style="color:#87ceeb">&amp;#34;quantile_bins&amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> covariate_col = &lt;span style="color:#87ceeb">&amp;#34;quantile_bins&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> covariate_values = [&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Q&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>i + &lt;span style="color:#f60">1&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span> &lt;span style="color:#f00">for&lt;/span> i in range(n_quantiles)]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Use categorical covariate directly&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_with_quantiles = df_clean
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> covariate_col = covariate
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> covariate_values = df_clean[covariate].unique().sort().to_list()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Create contingency table&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> contingency = df_with_quantiles.group_by([variable, covariate_col]).agg(pl.len().alias(&lt;span style="color:#87ceeb">&amp;#34;count&amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Convert to matrix format for chi-squared test&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> variables = df_clean[variable].unique().sort().to_list()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> quantiles = covariate_values
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> observed = np.zeros((len(variables), len(quantiles)))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i, var_val in enumerate(variables):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> j, quant_val in enumerate(quantiles):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> count_row = contingency.filter((pl.col(variable) == var_val) &amp;amp; (pl.col(covariate_col) == quant_val))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(count_row) &amp;gt; &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> observed[i, j] = count_row[&lt;span style="color:#87ceeb">&amp;#34;count&amp;#34;&lt;/span>][&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Handle case where we have 2x2 table - use Fisher&amp;#39;s exact test&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> observed.shape == (&lt;span style="color:#f60">2&lt;/span>, &lt;span style="color:#f60">2&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> fishers_exact_test(df, variable, covariate, n_quantiles)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Perform chi-squared test&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> statistic, p_value, dof, _ = stats.chi2_contingency(observed)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Critical value at 5% significance level&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value = float(stats.chi2.ppf(&lt;span style="color:#f60">0.95&lt;/span>, dof))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant = bool(statistic &amp;gt; critical_value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> covariate_is_numeric:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion = (
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>variable&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> and quantile of &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> are &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>&lt;span style="color:#87ceeb">&amp;#39;not independent&amp;#39;&lt;/span> &lt;span style="color:#f00">if&lt;/span> significant &lt;span style="color:#f00">else&lt;/span> &lt;span style="color:#87ceeb">&amp;#39;independent&amp;#39;&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>variable&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> and &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> are &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>&lt;span style="color:#87ceeb">&amp;#39;not independent&amp;#39;&lt;/span> &lt;span style="color:#f00">if&lt;/span> significant &lt;span style="color:#f00">else&lt;/span> &lt;span style="color:#87ceeb">&amp;#39;independent&amp;#39;&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Create contingency table data for explanation&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> contingency_data = {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;observed&amp;#34;&lt;/span>: observed.tolist(),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;variable_levels&amp;#34;&lt;/span>: variables,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;covariate_levels&amp;#34;&lt;/span>: quantiles,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;variable_name&amp;#34;&lt;/span>: variable,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;covariate_name&amp;#34;&lt;/span>: covariate &lt;span style="color:#f00">if&lt;/span> not covariate_is_numeric &lt;span style="color:#f00">else&lt;/span> &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> (quantiles)&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> StatisticalTestResult(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_name=&lt;span style="color:#87ceeb">&amp;#34;Chi-squared&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_statistic=float(statistic),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value=critical_value,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant=significant,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> p_value=float(p_value),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion=conclusion,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> contingency_table=contingency_data,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">fishers_exact_test&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df: Union[pl.DataFrame, pl.LazyFrame], variable: str, covariate: str, n_quantiles: int = &lt;span style="color:#f60">2&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; StatisticalTestResult:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Perform Fisher&amp;#39;s exact test for 2x2 contingency table.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Special case of chi-squared test when both variable and covariate have only 2 levels.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Select only needed columns and collect for analysis&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(df, pl.LazyFrame):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls().collect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Bin the numeric covariate into quantiles (should be 2 for Fisher&amp;#39;s exact)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_with_bins = bin_numeric_column(df_clean, covariate, n_quantiles, &lt;span style="color:#87ceeb">&amp;#34;quantile&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_with_quantiles = df_with_bins.with_columns(pl.col(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">_bins&amp;#34;&lt;/span>).alias(&lt;span style="color:#87ceeb">&amp;#34;quantile_bins&amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Create 2x2 contingency table&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> contingency = df_with_quantiles.group_by([variable, &lt;span style="color:#87ceeb">&amp;#34;quantile_bins&amp;#34;&lt;/span>]).agg(pl.len().alias(&lt;span style="color:#87ceeb">&amp;#34;count&amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> variables = df_clean[variable].unique().sort().to_list()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> quantiles = [&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Q&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>i + &lt;span style="color:#f60">1&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span> &lt;span style="color:#f00">for&lt;/span> i in range(n_quantiles)]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(variables) != &lt;span style="color:#f60">2&lt;/span> or len(quantiles) != &lt;span style="color:#f60">2&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ValueError(&lt;span style="color:#87ceeb">&amp;#34;Fisher&amp;#39;s exact test requires 2x2 contingency table&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> observed = np.zeros((&lt;span style="color:#f60">2&lt;/span>, &lt;span style="color:#f60">2&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i, var_val in enumerate(variables):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> j, quant_val in enumerate(quantiles):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> count_row = contingency.filter((pl.col(variable) == var_val) &amp;amp; (pl.col(&lt;span style="color:#87ceeb">&amp;#34;quantile_bins&amp;#34;&lt;/span>) == quant_val))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(count_row) &amp;gt; &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> observed[i, j] = count_row[&lt;span style="color:#87ceeb">&amp;#34;count&amp;#34;&lt;/span>][&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Perform Fisher&amp;#39;s exact test&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> odds_ratio, p_value = stats.fisher_exact(observed)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Convert odds ratio to z-score approximation&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> float(odds_ratio) &amp;gt; &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> z_score = abs(np.log(float(odds_ratio)) / np.sqrt(np.sum(&lt;span style="color:#f60">1&lt;/span> / observed[observed &amp;gt; &lt;span style="color:#f60">0&lt;/span>])))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> z_score = &lt;span style="color:#f60">0.0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Critical value for z-score at 5% significance level&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value = float(stats.norm.ppf(&lt;span style="color:#f60">0.975&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant = bool(z_score &amp;gt; critical_value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>variable&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> and quantile of &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> are &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>&lt;span style="color:#87ceeb">&amp;#39;not independent&amp;#39;&lt;/span> &lt;span style="color:#f00">if&lt;/span> significant &lt;span style="color:#f00">else&lt;/span> &lt;span style="color:#87ceeb">&amp;#39;independent&amp;#39;&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> StatisticalTestResult(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_name=&lt;span style="color:#87ceeb">&amp;#34;Fisher&amp;#39;s Exact&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_statistic=float(z_score),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value=critical_value,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant=significant,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> p_value=float(p_value),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion=conclusion,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">kolmogorov_smirnov_test&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df: Union[pl.DataFrame, pl.LazyFrame], variable: str, covariate: Optional[str] = &lt;span style="color:#f00">None&lt;/span>, n_bins: int = &lt;span style="color:#f60">6&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; StatisticalTestResult:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Perform Kolmogorov-Smirnov test for a numeric variable.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> If no covariate: Tests whether the variable departs from a uniform distribution.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> If covariate provided: Tests whether the distribution of variable varies across covariate groups.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> For numeric covariates, uses n_bins quantiles. For categorical covariates, uses category groups.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> df (pl.DataFrame): Input dataframe
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> variable (str): Name of the numeric variable to test
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> covariate (Optional[str]): Name of the covariate (None for single-variable test)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> n_bins (int): Number of bins to use for numeric covariates (default: 6)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> covariate is &lt;span style="color:#f00">None&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Original single-variable KS test for uniform distribution&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(df, pl.LazyFrame):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable]).drop_nulls().collect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable]).drop_nulls()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data = df_clean[variable].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Normalize data to [0, 1] range for uniform distribution test&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data_min, data_max = data.min(), data.max()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> data_max == data_min:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Constant data - definitely not uniform&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> StatisticalTestResult(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_name=&lt;span style="color:#87ceeb">&amp;#34;Kolmogorov-Smirnov&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_statistic=&lt;span style="color:#f60">1.0&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value=&lt;span style="color:#f60">0.0&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant=&lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> p_value=&lt;span style="color:#f60">0.0&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion=&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>variable&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> departs from a uniform distribution.&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> normalized_data = (data - data_min) / (data_max - data_min)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Perform KS test against uniform distribution&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> statistic, p_value = stats.kstest(normalized_data, &lt;span style="color:#87ceeb">&amp;#34;uniform&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Critical value at 5% significance level for KS test&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> n = len(normalized_data)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value = &lt;span style="color:#f60">1.36&lt;/span> / np.sqrt(n) &lt;span style="color:#0f0"># Approximation for large n at alpha=0.05&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant = bool(statistic &amp;gt; critical_value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>variable&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>&lt;span style="color:#87ceeb">&amp;#39;departs from&amp;#39;&lt;/span> &lt;span style="color:#f00">if&lt;/span> significant &lt;span style="color:#f00">else&lt;/span> &lt;span style="color:#87ceeb">&amp;#39;does not depart from&amp;#39;&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> a uniform distribution.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> StatisticalTestResult(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_name=&lt;span style="color:#87ceeb">&amp;#34;Kolmogorov-Smirnov&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_statistic=float(statistic),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value=float(critical_value),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant=significant,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> p_value=float(p_value),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion=conclusion,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Two-sample KS tests comparing distributions across covariate groups&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(df, pl.LazyFrame):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls().collect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Check if covariate is numeric or categorical&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">try&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Try to convert to numeric - if it works, treat as numeric&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean.with_columns(pl.col(covariate).cast(pl.Float64))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> is_numeric_covariate = &lt;span style="color:#f00">True&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">except&lt;/span> (pl.ComputeError, pl.InvalidOperationError, ValueError, TypeError):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> is_numeric_covariate = &lt;span style="color:#f00">False&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> is_numeric_covariate:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Numeric covariate: bin into n_bins quantiles&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_with_bins = bin_numeric_column(df_clean, covariate, n_bins, &lt;span style="color:#87ceeb">&amp;#34;quantile&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_with_quantiles = df_with_bins.with_columns(pl.col(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">_bins&amp;#34;&lt;/span>).alias(&lt;span style="color:#87ceeb">&amp;#34;covariate_groups&amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> groups = [&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Q&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>i + &lt;span style="color:#f60">1&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span> &lt;span style="color:#f00">for&lt;/span> i in range(n_bins)]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion_template = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;The distribution of &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>variable&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> &lt;/span>&lt;span style="color:#87ceeb">{{}}&lt;/span>&lt;span style="color:#87ceeb"> across quantile of &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Categorical covariate: use categories as groups&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_with_quantiles = df_clean.with_columns(pl.col(covariate).alias(&lt;span style="color:#87ceeb">&amp;#34;covariate_groups&amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> groups = sorted(df_clean[covariate].unique().to_list())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion_template = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;The distribution of &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>variable&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> &lt;/span>&lt;span style="color:#87ceeb">{{}}&lt;/span>&lt;span style="color:#87ceeb"> across values of &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Perform proper Kolmogorov-Smirnov tests between groups&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_data = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> group in groups:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_values = df_with_quantiles.filter(pl.col(&lt;span style="color:#87ceeb">&amp;#34;covariate_groups&amp;#34;&lt;/span>) == group)[variable].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(group_values) &amp;gt; &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_data.append(group_values)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(group_data) &amp;lt; &lt;span style="color:#f60">2&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ValueError(&lt;span style="color:#87ceeb">&amp;#34;Need at least 2 groups with data for comparison&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Use maximum KS statistic from all pairwise comparisons&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> max_statistic = &lt;span style="color:#f60">0.0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> min_p_value = &lt;span style="color:#f60">1.0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i in range(len(group_data)):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> j in range(i + &lt;span style="color:#f60">1&lt;/span>, len(group_data)):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ks_stat, ks_p = stats.ks_2samp(group_data[i], group_data[j])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> max_statistic = max(max_statistic, ks_stat)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> min_p_value = min(min_p_value, ks_p)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Critical value for two-sample KS test&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Approximation: sqrt(-0.5 * ln(alpha/2)) * sqrt((n1+n2)/(n1*n2))&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_n = sum(len(group) &lt;span style="color:#f00">for&lt;/span> group in group_data)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> avg_group_size = total_n / len(group_data)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value = &lt;span style="color:#f60">1.36&lt;/span> * np.sqrt(&lt;span style="color:#f60">2&lt;/span> / avg_group_size) &lt;span style="color:#0f0"># Approximation&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant = bool(max_statistic &amp;gt; critical_value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion = conclusion_template.format(&lt;span style="color:#87ceeb">&amp;#34;varies&amp;#34;&lt;/span> &lt;span style="color:#f00">if&lt;/span> significant &lt;span style="color:#f00">else&lt;/span> &lt;span style="color:#87ceeb">&amp;#34;does not vary&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Use the minimum p-value from pairwise comparisons&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> p_value = min_p_value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_statistic = max_statistic
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> StatisticalTestResult(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_name=&lt;span style="color:#87ceeb">&amp;#34;Kolmogorov-Smirnov&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_statistic=float(test_statistic),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value=float(critical_value),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant=significant,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> p_value=float(p_value),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion=conclusion,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">quantile_distribution_test&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df: Union[pl.DataFrame, pl.LazyFrame], variable: str, covariate: str, n_bins: int = &lt;span style="color:#f60">6&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; StatisticalTestResult:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Test whether the distribution of a numeric variable varies across quantiles of a numeric covariate.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Uses an F-statistic approach based on variance analysis between quantile groups.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> This is distinct from the Kolmogorov-Smirnov test and focuses on variance differences
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> rather than cumulative distribution differences.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> df (pl.DataFrame): Input dataframe
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> variable (str): Name of the numeric variable to test
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> covariate (str): Name of the numeric covariate to bin into quantiles
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> n_bins (int): Number of quantile bins to create (default: 6)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> StatisticalTestResult: Test results with F-statistic approach
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Clean data - select only needed columns and collect for analysis&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(df, pl.LazyFrame):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls().collect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Bin the covariate into quantiles&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_with_bins = bin_numeric_column(df_clean, covariate, n_bins, &lt;span style="color:#87ceeb">&amp;#34;quantile&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_with_quantiles = df_with_bins.with_columns(pl.col(&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">_bins&amp;#34;&lt;/span>).alias(&lt;span style="color:#87ceeb">&amp;#34;covariate_groups&amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Collect group data for analysis&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> groups = [&lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;Q&lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>i + &lt;span style="color:#f60">1&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span> &lt;span style="color:#f00">for&lt;/span> i in range(n_bins)]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_data = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_means = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_sizes = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> group in groups:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_values = df_with_quantiles.filter(pl.col(&lt;span style="color:#87ceeb">&amp;#34;covariate_groups&amp;#34;&lt;/span>) == group)[variable].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(group_values) &amp;gt; &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_data.append(group_values)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_means.append(np.mean(group_values))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> group_sizes.append(len(group_values))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(group_data) &amp;lt; &lt;span style="color:#f60">2&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ValueError(&lt;span style="color:#87ceeb">&amp;#34;Need at least 2 groups with data for comparison&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Calculate test statistic using F-statistic approach (ANOVA-like)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># This approach focuses on variance differences between quantile groups&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> all_values = np.concatenate(group_data)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> overall_mean = np.mean(all_values)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> n_groups = len(group_data)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_n = len(all_values)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Between-group sum of squares&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> between_group_ss = sum(size * (mean - overall_mean) ** &lt;span style="color:#f60">2&lt;/span> &lt;span style="color:#f00">for&lt;/span> size, mean in zip(group_sizes, group_means))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Within-group sum of squares&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> within_group_ss = sum(np.sum((data - np.mean(data)) ** &lt;span style="color:#f60">2&lt;/span>) &lt;span style="color:#f00">for&lt;/span> data in group_data)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Test statistic: ratio of sum of squares (F-statistic approach)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_statistic = between_group_ss / within_group_ss &lt;span style="color:#f00">if&lt;/span> within_group_ss &amp;gt; &lt;span style="color:#f60">0&lt;/span> &lt;span style="color:#f00">else&lt;/span> &lt;span style="color:#f60">0.0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Critical value using F-distribution with specific degrees of freedom&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Using (k-1, k+1) degrees of freedom based on empirical testing&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_numerator = n_groups - &lt;span style="color:#f60">1&lt;/span> &lt;span style="color:#0f0"># 5 for 6 groups&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_denominator = n_groups + &lt;span style="color:#f60">1&lt;/span> &lt;span style="color:#0f0"># 7 for 6 groups&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value = stats.f.ppf(&lt;span style="color:#f60">0.95&lt;/span>, df_numerator, df_denominator)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># P-value calculation using F-distribution&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> within_group_ss &amp;gt; &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Convert to F-statistic for p-value calculation&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> between_group_ms = between_group_ss / df_numerator
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> within_group_ms = within_group_ss / (total_n - n_groups)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> f_statistic = between_group_ms / within_group_ms &lt;span style="color:#f00">if&lt;/span> within_group_ms &amp;gt; &lt;span style="color:#f60">0&lt;/span> &lt;span style="color:#f00">else&lt;/span> &lt;span style="color:#f60">0.0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> p_value = &lt;span style="color:#f60">1&lt;/span> - stats.f.cdf(f_statistic, df_numerator, total_n - n_groups)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> p_value = &lt;span style="color:#f60">1.0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant = bool(test_statistic &amp;gt; critical_value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion_template = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;The distribution of &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>variable&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> &lt;/span>&lt;span style="color:#87ceeb">{{}}&lt;/span>&lt;span style="color:#87ceeb"> across quantile of &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion = conclusion_template.format(&lt;span style="color:#87ceeb">&amp;#34;varies&amp;#34;&lt;/span> &lt;span style="color:#f00">if&lt;/span> significant &lt;span style="color:#f00">else&lt;/span> &lt;span style="color:#87ceeb">&amp;#34;does not vary&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> StatisticalTestResult(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_name=&lt;span style="color:#87ceeb">&amp;#34;Quantile Distribution Test&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_statistic=float(test_statistic),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value=float(critical_value),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant=significant,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> p_value=float(p_value),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion=conclusion,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">pearson_correlation_test&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df: Union[pl.DataFrame, pl.LazyFrame], variable: str, covariate: str
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; StatisticalTestResult:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Perform Pearson Product-Moment Correlation test between two numeric variables.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Tests whether a correlation exists between variable and covariate.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Select only needed columns and collect for analysis&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(df, pl.LazyFrame):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls().collect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(df_clean) &amp;lt; &lt;span style="color:#f60">3&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ValueError(&lt;span style="color:#87ceeb">&amp;#34;Need at least 3 observations for correlation test&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> var_data = df_clean[variable].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cov_data = df_clean[covariate].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Perform Pearson correlation test&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> correlation, p_value = stats.pearsonr(var_data, cov_data)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Degrees of freedom&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> n = len(df_clean)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_val = n - &lt;span style="color:#f60">2&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Calculate t-statistic for correlation&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> abs(correlation) == &lt;span style="color:#f60">1.0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t_stat = float(&lt;span style="color:#87ceeb">&amp;#34;inf&amp;#34;&lt;/span>) &lt;span style="color:#f00">if&lt;/span> correlation &amp;gt; &lt;span style="color:#f60">0&lt;/span> &lt;span style="color:#f00">else&lt;/span> float(&lt;span style="color:#87ceeb">&amp;#34;-inf&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t_stat = correlation * np.sqrt(df_val) / np.sqrt(&lt;span style="color:#f60">1&lt;/span> - correlation**&lt;span style="color:#f60">2&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Critical value at 5% significance level (t-distribution)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value = float(stats.t.ppf(&lt;span style="color:#f60">0.975&lt;/span>, df_val))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant = bool(abs(t_stat) &amp;gt; critical_value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Determine correlation direction&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> significant:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> correlation &amp;gt; &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> direction = &lt;span style="color:#87ceeb">&amp;#34;positive&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> direction = &lt;span style="color:#87ceeb">&amp;#34;negative&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;A &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>direction&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> correlation exists between &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>variable&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> and &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;No significant correlation exists between &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>variable&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> and &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> StatisticalTestResult(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_name=&lt;span style="color:#87ceeb">&amp;#34;Pearson Correlation&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_statistic=float(correlation),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value=critical_value,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant=significant,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> p_value=float(p_value),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion=conclusion,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> degrees_of_freedom=df_val,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">spearman_correlation_test&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df: Union[pl.DataFrame, pl.LazyFrame], variable: str, covariate: str
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>) -&amp;gt; StatisticalTestResult:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Perform Spearman&amp;#39;s Rank Correlation test between a numeric variable and categorical covariate.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> The categorical covariate is converted to numeric ranks for correlation analysis.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Tests whether a monotonic correlation exists between variable and covariate.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Select only needed columns and collect for analysis&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> isinstance(df, pl.LazyFrame):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls().collect()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_clean = df.select([variable, covariate]).drop_nulls()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(df_clean) &amp;lt; &lt;span style="color:#f60">3&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ValueError(&lt;span style="color:#87ceeb">&amp;#34;Need at least 3 observations for correlation test&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Convert categorical covariate to numeric by ordering categories and assigning ranks&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> categories = sorted(df_clean[covariate].unique().to_list())
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> category_to_rank = {cat: i &lt;span style="color:#f00">for&lt;/span> i, cat in enumerate(categories)}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Create numeric representation of categorical variable&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_with_ranks = df_clean.with_columns(pl.col(covariate).replace(category_to_rank).alias(&lt;span style="color:#87ceeb">&amp;#34;covariate_rank&amp;#34;&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> var_data = df_with_ranks[variable].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cov_rank_data = df_with_ranks[&lt;span style="color:#87ceeb">&amp;#34;covariate_rank&amp;#34;&lt;/span>].to_numpy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Perform Spearman correlation test&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> correlation, p_value = stats.spearmanr(var_data, cov_rank_data)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Degrees of freedom&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> n = len(df_clean)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df_val = n - &lt;span style="color:#f60">2&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Calculate t-statistic for correlation&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> abs(correlation) == &lt;span style="color:#f60">1.0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t_stat = float(&lt;span style="color:#87ceeb">&amp;#34;inf&amp;#34;&lt;/span>) &lt;span style="color:#f00">if&lt;/span> correlation &amp;gt; &lt;span style="color:#f60">0&lt;/span> &lt;span style="color:#f00">else&lt;/span> float(&lt;span style="color:#87ceeb">&amp;#34;-inf&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t_stat = correlation * np.sqrt(df_val) / np.sqrt(&lt;span style="color:#f60">1&lt;/span> - correlation**&lt;span style="color:#f60">2&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Critical value at 5% significance level (t-distribution)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value = float(stats.t.ppf(&lt;span style="color:#f60">0.975&lt;/span>, df_val))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant = bool(abs(t_stat) &amp;gt; critical_value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Determine correlation direction&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> significant:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> correlation &amp;gt; &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> direction = &lt;span style="color:#87ceeb">&amp;#34;positive&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> direction = &lt;span style="color:#87ceeb">&amp;#34;negative&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;A &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>direction&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> correlation exists between &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>variable&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> and &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion = &lt;span style="color:#87ceeb">f&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;No significant correlation exists between &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>variable&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb"> and &lt;/span>&lt;span style="color:#87ceeb">{&lt;/span>covariate&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> StatisticalTestResult(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_name=&lt;span style="color:#87ceeb">&amp;#34;Spearman Correlation&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test_statistic=float(correlation),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> critical_value=critical_value,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> significant=significant,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> p_value=float(p_value),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> conclusion=conclusion,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> degrees_of_freedom=df_val,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Forecasting by frequency interpolation of time series</title><link>https://asifr.com/fits/</link><pubDate>Sat, 18 Jan 2025 00:00:00 +0000</pubDate><guid>https://asifr.com/fits/</guid><description>
&lt;p>The FITS algorithm propsed by Xu et.al in &lt;a href="https://arxiv.org/abs/2307.03756">FITS: Modeling time series with 10k parameters&lt;/a>, uses a neat trick from signal processing to forecasting.&lt;/p>
&lt;p>The principle it exploits is: increasing the resolution in the frequency domain also increases the signal length in the time domain. In other words, longer time series provides a higher frequency resolution.&lt;/p>
&lt;p>The algorithm extends the resolution of the power spectral density as follows:&lt;/p>
&lt;ol>
&lt;li>De-mean the time series by subtracting the mean. This gets rid of the DC component (the dominant zero-frequency amplitude in the PSD).&lt;/li>
&lt;li>Compute the real Fourier transform (rFFT) of the time series getting back a complex valued signal.
&lt;ol>
&lt;li>The rFFT condenses the length of the time series N to N/2+1 complex numbers.&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>Learn the complex valued signal, the amplitude and phase, using a linear layer with complex type.
&lt;ol>
&lt;li>The layer has input size &lt;code>F&lt;/code> and output size &lt;code>F * length_ratio&lt;/code>. Where, &lt;code>F&lt;/code> is the length of the complex valued signal and &lt;code>length_ratio&lt;/code> is &lt;code>(sequence_length + prediction_length) * sequence_length&lt;/code>. Length ratio is therefore &amp;gt; 1 and &lt;code>F * length_ratio&lt;/code> is larger than the original frequency resolution &lt;code>F&lt;/code>.&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>Add back the mean to the predicted signal.&lt;/li>
&lt;li>Compute the inverse rFFT to get the predicted time series, which is now longer than the original time series.&lt;/li>
&lt;/ol>
&lt;p>The method is fast, because the FFT is fast, It is also memory efficient, because working in the frequency domain and using complex valued paramaters reduces the number of parameters to learn. A major benefit is the model supervises both the forecasting horizon but also backcasting on the look-back window. One concern is that the method does not do a good job at capturing the local-linear trends in the time series but should do a good job at capturing the periodicity.&lt;/p></description></item><item><title>Small language models (updated June 2025)</title><link>https://asifr.com/small-language-models/</link><pubDate>Mon, 30 Dec 2024 00:00:00 +0000</pubDate><guid>https://asifr.com/small-language-models/</guid><description>
&lt;p>Last updated: &lt;strong>2025-06-15&lt;/strong>&lt;/p>
&lt;p>Small language models are increasingly capable of performing a wide range of tasks locally on-device and in the web browser. This page lists some interesting small language models. I am classifying small models as those with fewer than 1 billion parameters. This page will be updated regularly as I evaluate new models.&lt;/p>
&lt;p>&lt;strong>Why?&lt;/strong> Small language models are ideal for structured problems where reasoning and &amp;ldquo;thinking&amp;rdquo; are not necessary. This actually covers a wide range of use cases like entity extraction, structured data extraction, summarization, classification, multi-turn conversations, text composition, text revision, and content-tagging. Small LM&amp;rsquo;s are also ideal candidates for fine tuning to learn domain-specific knowledge.&lt;/p>
&lt;p>&lt;strong>Limitations&lt;/strong>: By virtue of being small and compressed, small language models have some important limitations. Complex reasoning tasks should be broken down to simpler steps. Small LM&amp;rsquo;s should avoid math and code generation tasks. They also have limited world knowledge, unlike larger models that have &amp;ldquo;overfit&amp;rdquo; or memorized large amounts of factual information up to their training cutoff date. As such, small LM&amp;rsquo;s are more likely to hallucinate (provide made-up information) when asked about facts or events. Although fine tuning is not necessary for many tasks and comes with its own challenges like forgetting world knowledge that is learned during training, it can still be useful to get the small LM to learn domain-specific knowledge.&lt;/p>
&lt;p>&lt;a href="https://github.com/huggingface/smollm">SmolLM2&lt;/a> from HuggingFace (&lt;a href="https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct">135M&lt;/a>, &lt;a href="https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct">360M&lt;/a>, &lt;a href="https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct#model-summary">1.7B&lt;/a>) - General-purpose. The smallest models can run efficiently in the web browser. Good for entity extraction, summarizing small text, structured data extraction&lt;/p>
&lt;p>&lt;a href="https://numind.ai/blog/nuextract-a-foundation-model-for-structured-extraction">NuExtract-v1.5&lt;/a> from NuMind (&lt;a href="https://huggingface.co/numind/NuExtract-tiny">Tiny&lt;/a>, &lt;a href="https://huggingface.co/numind/NuExtract">Base&lt;/a>, &lt;a href="https://huggingface.co/numind/NuExtract-large">Large&lt;/a>) - Fine-tuned for structured entity extraction. This model takes a text input and an example JSON output and returns a JSON string that matches the example schema. This is different from structured output generation through token sampling. NuExtract generates JSON strings directly and is trained to do so with high accuracy. Token sampling on the other hand constrains the decoder logits to produce valid JSON that follows a predefined grammar or regex pattern. Token sampling is more flexible and works with any base model but has additional overhead in the generation process. NuExtract promises to be more efficient (by directly producing JSON strings) and accurate for structured extraction tasks.&lt;/p>
&lt;p>&lt;a href="https://github.com/Snowflake-Labs/arctic-embed">Arctic-embed&lt;/a> from Snowflake (22M-335M) - Embedding-only, excels at retrieval tasks. I&amp;rsquo;ve used this model for a few retrieval problems and it works as advertised. I&amp;rsquo;ve found these smaller embedding models are a good alternative to BERT for pure retrieval tasks.&lt;/p>
&lt;p>&lt;a href="https://huggingface.co/nomic-ai/nomic-embed-text-v1.5">Nomic-Embed-text&lt;/a> from &lt;a href="https://www.nomic.ai/blog/posts/nomic-embed-text-v1">Nomic&lt;/a> - Embedding-only, excels at retrieval tasks. I&amp;rsquo;ve found Nomic to be highly capable for it&amp;rsquo;s size. The model handles sequence lengthds from 2048 to 8192 input tokens. nomic-embed-text -v1.5 was trained with Matryoshka Representation learning, which means you can choose the output embedding dimension from 64 up to 768. The highest dimension 768 is &lt;a href="https://huggingface.co/nomic-ai/nomic-embed-text-v1.5#adjusting-dimensionality">most accurate&lt;/a> and accuracy is decent down to 256 dimensions, after which it drops off quickly.&lt;/p>
&lt;p>&lt;a href="https://qwenlm.github.io/blog/qwen3/">Qwen 3&lt;/a> provides a &lt;a href="https://huggingface.co/Qwen/Qwen3-0.6B">0.6B&lt;/a> parameter model with a 32K context length and 1.5GB size and Apache 2.0 license. This model is small but has thinking and reasoning capability. Ollama supports Qwen 3 and the thinking can be enabled/disabled using the &lt;code>think&lt;/code> parameter or by passing &lt;code>/no_think&lt;/code> in the prompt. Qwen 3 docs provide some guidance and best practices:&lt;/p>
&lt;blockquote>
&lt;p>For thinking mode, use Temperature=0.6, TopP=0.95, TopK=20, and MinP=0 (the default setting in generation_config.json). DO NOT use greedy decoding, as it can lead to performance degradation and endless repetitions. For more detailed guidance, please refer to the &lt;a href="https://huggingface.co/Qwen/Qwen3-0.6B#best-practices">Best Practices&lt;/a> section.&lt;/p>&lt;/blockquote></description></item><item><title>Convolutions as spectral filters</title><link>https://asifr.com/convolutions-as-spectral-filters/</link><pubDate>Tue, 24 Dec 2024 00:00:00 +0000</pubDate><guid>https://asifr.com/convolutions-as-spectral-filters/</guid><description>
&lt;p>Convolution in the time domain is a sliding dot-product between a kernel and a signal. This operation requires that we align the kernel with each position of the signal and compute the dot-product at each position, making sure that the kernel does not extend beyond the signal boundaries by padding the signal and cutting the output to the original signal length.&lt;/p>
&lt;p>Alternatively, a convolution in the time domain is equivalent to element-wise multiplication in the frequency domain. We can perform the convolution more efficiently using the Fast Fourier Transform (FFT). By computing the Power Spectral Density (PSD) of both the signal and the kernel using the FFT and then multiplying them element-wise. Finally, we can reconstruct the convolved signal using the inverse FFT. FFT-based convolution is faster than the direct convolution, especially for long signals and kernels.&lt;/p>
&lt;p>&lt;img src="https://asifr.com/images/freq-diagram.png" alt="">&lt;/p>
&lt;p>This provides a different perspective on convolutions by thinking of convolutions as spectral filters. Consider a sine wave with a frequency at $f$ convolved with a gaussian kernel. The power spectra of a pure sine wave has a bar at the frequency $f$. The power spectra of a gaussian kernel is a negative exponential. The narrower the gaussian we get a more gentle exponential decay in the frequency domain. If we multiply the two power spectra element-wise, we get basically zeros everywhere the two power spectra do not overlap. Only at the frequency $f$ we get a non-zero value. Convolution in the time domain is equivalent to multiplication in the frequency domain.&lt;/p>
&lt;p>I think this is a very intuitive way to understand why convolutions work. A convolution filters out the frequencies that are not present in both the signal and the kernel. Only the features of the signal that share characteristics with the features in the kernel are amplitude modulated and preserved in the output.&lt;/p>
&lt;p>We can do some other interesting things in the frequency domain, like filtering out noise and reconstructing the signal using the inverse Fourier transform. Let&amp;rsquo;s see how we can decompose a signal into its frequency components using the FFT and reconstruct the signal using the iFFT.&lt;/p>
&lt;p>{% include &amp;lsquo;freq-recon.html&amp;rsquo; %}&lt;/p>
&lt;p>&lt;em>A signal that has been reconstructed from the top-5 dominant frequencies.&lt;/em>&lt;/p>
&lt;p>Given a time series of values &lt;code>value&lt;/code> and timesteps &lt;code>ts&lt;/code>, the rFFT of the signal is computed using the following code snippet. We first detrend the signal by subtracting the mean and dividing by the standard deviation (z-score). This removes the zero-Hz fequency (DC offset), which would otherwise dominate the power spectrum. We then compute the rFFT of the detrended signal to get the complex values (amplitude and phase). The amplitudes are the absolute values of the complex values. The frequencies are computed using the &lt;code>rfftfreq&lt;/code> function.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Examine the PSD using the rFFT&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>xmean = np.mean(value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>xvar = np.var(value)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>zvalues = (value - xmean) / np.sqrt(xvar)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>tsnorm = (ts - ts[&lt;span style="color:#f60">0&lt;/span>]) / (ts[-&lt;span style="color:#f60">1&lt;/span>] - ts[&lt;span style="color:#f60">0&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>rfft_values = np.fft.rfft(zvalues) &lt;span style="color:#0f0"># complex values&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>amplitudes = np.abs(rfft_values) &lt;span style="color:#0f0"># amplitudes&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>freqs = np.fft.rfftfreq(len(value), d=tsnorm[&lt;span style="color:#f60">1&lt;/span>] - tsnorm[&lt;span style="color:#f60">0&lt;/span>])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>We plot the power spectral density (PSD) of the signal, which tells us the energy at each frequency.&lt;/p>
&lt;p>{% include &amp;lsquo;freq-psd.html&amp;rsquo; %}&lt;/p>
&lt;p>Power spectrum of the original signal.&lt;/p>
&lt;p>Since period is the inverse of frequency, by identifying the frequencies that carry most of the energy, we can also discover the most dominant periods. The signal has a few dominant frequencies. We can select the top-5 frequencies (10.96, 9.96, 20.92, 21.91, 4.98) and reconstruct the signal using the inverse Fourier transform. This is equivalent to filtering for frequencies that capture most of the signal energy and removing the rest. This allows us to denoise the signal by removing the high-frequency components. Notice the reconstructed signal is a smoothed version of the original signal.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Extract the top 5 dominant frequencies&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>top5 = np.argsort(amplitudes)[::-&lt;span style="color:#f60">1&lt;/span>][:&lt;span style="color:#f60">5&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>top5_freqs = freqs[top5]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>print(&lt;span style="color:#87ceeb">&amp;#34;Top 5 frequencies:&amp;#34;&lt;/span>, top5_freqs.round(&lt;span style="color:#f60">2&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Reconstruct the signal using the top 5 frequencies&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>rfft_values_filtered = np.zeros_like(rfft_values)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>rfft_values_filtered[top5] = rfft_values[top5]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>recon = np.fft.irfft(rfft_values_filtered)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>recon = recon * np.sqrt(xvar) + xmean
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>You can also low-pass filter the signal by setting an upper-bound on the cut-off frequency and setting all amplitudes above the cut-off frequency to zero, then reconstruct the signal using the inverse Fourier transform.&lt;/p>
&lt;p>Finally, we can implement a 1D convolution using the FFT in PyTorch.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> torch
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> torch.fft &lt;span style="color:#f00">as&lt;/span> fft
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> scipy.fftpack &lt;span style="color:#f00">import&lt;/span> next_fast_len
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">conv1d_fft&lt;/span>(signal: torch.Tensor, kernel: torch.Tensor, dim: int=-&lt;span style="color:#f60">1&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Convolve two 1D tensors using FFT.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> signal (Tensor): Shape (batch_size, N) where N is the signal length
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> kernel (Tensor): Shape (batch_size, M) where M is the kernel length
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> dim (int, optional): Dimension along which to convolve. Default is -1.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Tensor: Shape (batch_size, N) containing the convolved signal
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> N = signal.size(dim) &lt;span style="color:#0f0"># signal length&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> M = kernel.size(dim) &lt;span style="color:#0f0"># kernel length&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> fast_len = next_fast_len(N + M - &lt;span style="color:#f60">1&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> F_f = fft.rfft(signal, fast_len, dim=dim) &lt;span style="color:#0f0"># shape (N, fast_len // 2 + 1)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> F_g = fft.rfft(kernel, fast_len, dim=dim) &lt;span style="color:#0f0"># shape (N, fast_len // 2 + 1)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> F_fg = F_f * F_g.conj()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> out = fft.irfft(F_fg, fast_len, dim=dim)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> out = out.roll((-&lt;span style="color:#f60">1&lt;/span>,), dims=(dim,))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> idx = torch.as_tensor(range(fast_len - N, fast_len)).to(out.device)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> out = out.index_select(dim, idx)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> out
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Zero-build Vue JS apps</title><link>https://asifr.com/vue/</link><pubDate>Sat, 14 Dec 2024 00:00:00 +0000</pubDate><guid>https://asifr.com/vue/</guid><description>
&lt;p>Here is a template for a Vue app that does not require a build step. This is useful for small projects where you want to quickly iterate on a UI without having to setup a build process using Node.js and the NPM package manager. I find this type of zero-build setup especially rewarding when I work on a project for a brief time, deploy it, and then have to come back to it after a few months to make a small change or fix a bug. A few of the benefits I&amp;rsquo;ve noticed:&lt;/p>
&lt;ul>
&lt;li>All the code is in one place so I don&amp;rsquo;t have to look across a large code repository to re-learn the structure of the UI.&lt;/li>
&lt;li>Vue helps organize the code in a standard way and has great documentation.&lt;/li>
&lt;li>The deployment process is to copy the HTML file and any other assets to a server and launch caddy to serve the files.&lt;/li>
&lt;/ul>
&lt;p>My typical workflow is to start with the HTML template below and add data, methods, and computed properties as needed. &lt;a href="https://getbootstrap.com/">Bootstrap CSS&lt;/a> is well-known, documented, and easy to use so it&amp;rsquo;s a good place to start. &lt;a href="https://tailwindcss.com/">Tailwind CSS&lt;/a> is another option, but newer versions (v3+) of Tailwind require a build step to generate the CSS file and the &lt;a href="https://unpkg.com/browse/tailwindcss@2.2.19/dist/">last minified CDN version&lt;/a> (v2.2.19) is a large 2.93 MB file compared to Bootstrap&amp;rsquo;s 233 KB file.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-html" data-lang="html">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e5e5e5">&amp;lt;!DOCTYPE html&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;html lang=&lt;span style="color:#87ceeb">&amp;#34;en&amp;#34;&lt;/span>&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;head&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;meta charset=&lt;span style="color:#87ceeb">&amp;#34;UTF-8&amp;#34;&lt;/span> /&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;meta name=&lt;span style="color:#87ceeb">&amp;#34;viewport&amp;#34;&lt;/span> content=&lt;span style="color:#87ceeb">&amp;#34;width=device-width, initial-scale=1.0&amp;#34;&lt;/span> /&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;title&amp;gt;Document&amp;lt;/title&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;script src=&lt;span style="color:#87ceeb">&amp;#34;https://unpkg.com/vue@3/dist/vue.global.prod.js&amp;#34;&lt;/span>&amp;gt;&amp;lt;/script&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;link href=&lt;span style="color:#87ceeb">&amp;#34;https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/css/bootstrap.min.css&amp;#34;&lt;/span> rel=&lt;span style="color:#87ceeb">&amp;#34;stylesheet&amp;#34;&lt;/span>/&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;script src=&lt;span style="color:#87ceeb">&amp;#34;https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/js/bootstrap.bundle.min.js&amp;#34;&lt;/span>&amp;gt;&amp;lt;/script&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;/head&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;body&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;div id=&lt;span style="color:#87ceeb">&amp;#34;app&amp;#34;&lt;/span>&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">&amp;lt;!-- Custom template --&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;example-component name=&lt;span style="color:#87ceeb">&amp;#34;Hello Vue!&amp;#34;&lt;/span>&amp;gt;&amp;lt;/example-component&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;/div&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">&amp;lt;!-- Template tags should be defined outside the mounted app (#app) --&amp;gt;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;template id=&lt;span style="color:#87ceeb">&amp;#34;example-component&amp;#34;&lt;/span>&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;div v-text=&lt;span style="color:#87ceeb">&amp;#34;name&amp;#34;&lt;/span>&amp;gt;&amp;lt;/div&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;/template&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;script type=&lt;span style="color:#87ceeb">&amp;#34;module&amp;#34;&lt;/span>&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// Custom Vue component
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> &lt;span style="color:#f00">const&lt;/span> exampleComponent = {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// The template is defined in the &amp;lt;template&amp;gt; tag
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> template: document.getElementById(&lt;span style="color:#87ceeb">&amp;#34;example-component&amp;#34;&lt;/span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> props: [&lt;span style="color:#87ceeb">&amp;#34;name&amp;#34;&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data() {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> {};
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">async&lt;/span> mounted() {},
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> methods: {},
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> };
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">const&lt;/span> app = Vue.createApp({
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data() {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> {};
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> computed: {},
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">async&lt;/span> mounted() {},
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> methods: {},
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// Register the custom component
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> components: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;example-component&amp;#34;&lt;/span>: exampleComponent,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> });
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> app.mount(&lt;span style="color:#87ceeb">&amp;#34;#app&amp;#34;&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;/script&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;/body&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&amp;lt;/html&amp;gt;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Static site build script</title><link>https://asifr.com/static-site-build-script/</link><pubDate>Sat, 07 Dec 2024 00:00:00 +0000</pubDate><guid>https://asifr.com/static-site-build-script/</guid><description>
&lt;p>This little shell script compiles a folder of markdown files into HTML files using &lt;a href="https://pandoc.org/">Pandoc&lt;/a>.&lt;/p>
&lt;p>First it preprocesses markdown files as &lt;a href="https://mustache.github.io/">Mustache&lt;/a> templates. This lets you use variables in your markdown files that are defined in a &lt;code>metadata.yaml&lt;/code> file or the frontmatter. The script then uses Pandoc to convert the markdown files to standalone HTML files.&lt;/p>
&lt;h2 id="usage">Usage&lt;/h2>
&lt;p>Save the source to a file named &lt;code>./dev&lt;/code> and make the script executable:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>chmod +x ./dev
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>On MacOS run the setup command to install the required dependencies:&lt;/p>
&lt;ul>
&lt;li>Pandoc&lt;/li>
&lt;li>Mustache&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>./dev setup
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>To build the site run:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>./dev build
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>To force a full rebuild run:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>./dev build -F
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>To build a specific file run:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>./dev build content/notes.md
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>If you have caddy and npx installed, you can run a local server and watch for changes:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>./dev run
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="source">Source&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e5e5e5">#!/bin/bash
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#e5e5e5">&lt;/span>&lt;span style="color:#0f0"># Usage:&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># chmod +x dev&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># ./dev [COMMAND]&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>set -e
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Files modified in the last 30 minutes will be rebuilt&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#eedd82">MMIN&lt;/span>=&lt;span style="color:#f60">30&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#eedd82">ERROR&lt;/span>=&lt;span style="color:#87ceeb">&amp;#39;\033[0;31m&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#eedd82">SUCCESS&lt;/span>=&lt;span style="color:#87ceeb">&amp;#39;\033[0;32m&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#eedd82">CODE&lt;/span>=&lt;span style="color:#87ceeb">&amp;#39;\033[0;36m&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#eedd82">NC&lt;/span>=&lt;span style="color:#87ceeb">&amp;#39;\033[0m&amp;#39;&lt;/span> &lt;span style="color:#0f0"># No Color&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#eedd82">cmd_helps&lt;/span>=()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>defhelp() {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> local &lt;span style="color:#eedd82">command&lt;/span>=&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">${&lt;/span>&lt;span style="color:#eedd82">1&lt;/span>?&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> local &lt;span style="color:#eedd82">text&lt;/span>=&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">${&lt;/span>&lt;span style="color:#eedd82">2&lt;/span>?&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> local help_str
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#eedd82">help_str&lt;/span>=&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#f00">$(&lt;/span>printf &lt;span style="color:#87ceeb">&amp;#39; %-24s %s&amp;#39;&lt;/span> &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$command&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span> &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$text&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#f00">)&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#eedd82">cmd_helps&lt;/span>+=(&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$help_str&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Print out help information&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cmd_help() {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;Script for performing dev tasks.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;Usage: ./dev [COMMAND]&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;Replace [COMMAND] with a word from the list below.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;COMMAND list:&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> str in &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">${&lt;/span>&lt;span style="color:#eedd82">cmd_helps&lt;/span>[@]&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>; &lt;span style="color:#f00">do&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo -e &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$str&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">done&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>defhelp help &lt;span style="color:#87ceeb">&amp;#39;View all help.&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># ------------------------------------------------------------------------------&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Repo&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># ------------------------------------------------------------------------------&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cmd_clean() {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;Cleaning up...&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rm -f public/*.html
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rm -f public/*.xml
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>defhelp clean &lt;span style="color:#87ceeb">&amp;#39;Clean up.&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cmd_setup() {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;Setting up...&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># check if jq is installed&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> ! command -v jq &amp;amp;&amp;gt; /dev/null; &lt;span style="color:#f00">then&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;Installing jq...&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> brew install jq
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">fi&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># check if pandoc is installed&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> ! command -v pandoc &amp;amp;&amp;gt; /dev/null; &lt;span style="color:#f00">then&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;Installing pandoc...&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> brew install pandoc
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">fi&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># check if mustache is installed&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> ! command -v mustache &amp;amp;&amp;gt; /dev/null; &lt;span style="color:#f00">then&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;Installing mustache...&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> go install github.com/cbroglie/mustache/cmd/mustache@latest
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">fi&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Build a file or all files, optionally force a full rebuild&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Build all files that have changed in the last 30 days&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># ./dex build&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Build all files&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># ./dex build -F&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Build a specific file&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># ./dex build file.md&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cmd_build() {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;Building...&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#eedd82">REBUILD&lt;/span>=&lt;span style="color:#f60">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Check if given -F flag to force a full rebuild&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># ignore the flag if it is not given&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">while&lt;/span> getopts &lt;span style="color:#87ceeb">&amp;#34;F&amp;#34;&lt;/span> opt; &lt;span style="color:#f00">do&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">case&lt;/span> &lt;span style="color:#87ceeb">${&lt;/span>&lt;span style="color:#eedd82">opt&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span> in
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> F)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#eedd82">REBUILD&lt;/span>=&lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ;;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">\?&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># ignore unknown flags&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ;;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">esac&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">done&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># check if any files in templates/ have changed in the last 30 minutes, if so, force a full rebuild&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> [ &lt;span style="color:#eedd82">$REBUILD&lt;/span> -eq &lt;span style="color:#f60">0&lt;/span> ]; &lt;span style="color:#f00">then&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> [ &lt;span style="color:#f00">$(&lt;/span>find templates -type f -mmin -&lt;span style="color:#eedd82">$MMIN&lt;/span> | wc -l&lt;span style="color:#f00">)&lt;/span> -gt &lt;span style="color:#f60">0&lt;/span> ]; &lt;span style="color:#f00">then&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;Templates have changed, forcing a full rebuild...&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#eedd82">REBUILD&lt;/span>=&lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">fi&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">fi&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Markdown extension (e.g. md, markdown, mdown).&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#eedd82">MEXT&lt;/span>=&lt;span style="color:#87ceeb">&amp;#34;md&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># if rebuild=0 and a file name is given, build that file&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> [ &lt;span style="color:#eedd82">$REBUILD&lt;/span> -eq &lt;span style="color:#f60">0&lt;/span> ] &amp;amp;&amp;amp; [ &lt;span style="color:#eedd82">$#&lt;/span> -eq &lt;span style="color:#f60">1&lt;/span> ]; &lt;span style="color:#f00">then&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#eedd82">FILES&lt;/span>=&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$1&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">fi&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Only check for files if FILES is not set&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> [ -z &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$FILES&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span> ]; &lt;span style="color:#f00">then&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># get all markdown files that have changed in the last 30 minutes if not forcing a full rebuild&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># otherwise, get all markdown files&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> [ &lt;span style="color:#eedd82">$REBUILD&lt;/span> -eq &lt;span style="color:#f60">0&lt;/span> ]; &lt;span style="color:#f00">then&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;Incremental build...&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#eedd82">FILES&lt;/span>=&lt;span style="color:#f00">$(&lt;/span>find content -type f -name &lt;span style="color:#87ceeb">&amp;#34;*.&lt;/span>&lt;span style="color:#eedd82">$MEXT&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span> -mmin -&lt;span style="color:#eedd82">$MMIN&lt;/span>&lt;span style="color:#f00">)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;Full build...&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#eedd82">FILES&lt;/span>=&lt;span style="color:#f00">$(&lt;/span>find content -type f -name &lt;span style="color:#87ceeb">&amp;#34;*.&lt;/span>&lt;span style="color:#eedd82">$MEXT&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#f00">)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">fi&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">fi&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># if there are no files, exit&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> [ -z &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$FILES&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span> ]; &lt;span style="color:#f00">then&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">${&lt;/span>&lt;span style="color:#eedd82">ERROR&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">No files to process!&lt;/span>&lt;span style="color:#87ceeb">${&lt;/span>&lt;span style="color:#eedd82">NC&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> exit &lt;span style="color:#f60">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">fi&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># Location of the root directory with this Makefile, templates/, content/, public/&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#eedd82">ROOT&lt;/span>=&lt;span style="color:#f00">$(&lt;/span>pwd&lt;span style="color:#f00">)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;Files to process:&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;---&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$FILES&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;---&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># build each file&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> file in &lt;span style="color:#eedd82">$FILES&lt;/span>; &lt;span style="color:#f00">do&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;Building: &lt;/span>&lt;span style="color:#eedd82">$file&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># get the file name without the extension&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#eedd82">FILENAME&lt;/span>=&lt;span style="color:#f00">$(&lt;/span>basename -- &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$file&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#f00">)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#eedd82">FILENAME&lt;/span>=&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">${&lt;/span>&lt;span style="color:#eedd82">FILENAME&lt;/span>%.*&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># gather yaml front matter from the file if it exists using sed and awk&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#eedd82">frontmatter&lt;/span>=&lt;span style="color:#f00">$(&lt;/span>awk &lt;span style="color:#87ceeb">&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> # Start capturing when we find the opening --- line
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> /^---$/ { if (capture) exit; capture=1; next }
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> # Print lines only if capture is active
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> capture { print }
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#39;&lt;/span> &amp;lt; &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$file&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#f00">)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># strip leading and trailing --- from the frontmatter and add it back&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># we do this as a sanity check in case the file does not have frontmatter properly formatted&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#eedd82">frontmatter&lt;/span>=&lt;span style="color:#f00">$(&lt;/span>echo &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$frontmatter&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span> | sed &lt;span style="color:#87ceeb">&amp;#39;s/^---//&amp;#39;&lt;/span> | sed &lt;span style="color:#87ceeb">&amp;#39;s/---$//&amp;#39;&lt;/span>&lt;span style="color:#f00">)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#eedd82">frontmatter&lt;/span>=&lt;span style="color:#f00">$(&lt;/span>echo &lt;span style="color:#87ceeb">$&amp;#39;\n&amp;#39;&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;---&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">$&amp;#39;\n&amp;#39;&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$frontmatter&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">$&amp;#39;\n&amp;#39;&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;---&amp;#34;&lt;/span>&lt;span style="color:#f00">)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># collect all context data in one place&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#eedd82">context&lt;/span>=&lt;span style="color:#f00">$(&lt;/span>cat content/metadata.yaml &amp;lt;(echo &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$frontmatter&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#f00">)&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># preprocess the markdown file with mustache, use the frontmatter and metadata.yaml as context&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># cat the frontmatter and metadata.yaml firstthen pipe the markdown file to mustache&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#eedd82">inputtext&lt;/span>=&lt;span style="color:#f00">$(&lt;/span>cat &amp;lt;(echo &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$context&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#f00">)&lt;/span> | mustache &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$file&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> pandoc -r markdown+simple_tables+table_captions+yaml_metadata_block+auto_identifiers+header_attributes+fenced_code_blocks+fenced_code_attributes+tex_math_dollars &lt;span style="color:#87ceeb">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">&lt;/span> -w html &lt;span style="color:#87ceeb">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">&lt;/span> --tab-stop=&lt;span style="color:#f60">2&lt;/span> &lt;span style="color:#87ceeb">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">&lt;/span> --toc &lt;span style="color:#87ceeb">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">&lt;/span> --mathjax &lt;span style="color:#87ceeb">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">&lt;/span> --metadata-file content/metadata.yaml &lt;span style="color:#87ceeb">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">&lt;/span> -V &lt;span style="color:#eedd82">builddate&lt;/span>=&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#f00">$(&lt;/span>date +&lt;span style="color:#87ceeb">&amp;#34;%a, %d %b %Y %H:%M:%S %z&amp;#34;&lt;/span>&lt;span style="color:#f00">)&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span> &lt;span style="color:#87ceeb">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">&lt;/span> -V &lt;span style="color:#eedd82">year&lt;/span>=&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#f00">$(&lt;/span>date +&lt;span style="color:#87ceeb">&amp;#34;%Y&amp;#34;&lt;/span>&lt;span style="color:#f00">)&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span> &lt;span style="color:#87ceeb">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">&lt;/span> --template=./templates/bear.html &lt;span style="color:#87ceeb">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">&lt;/span> -o ./public/&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$FILENAME&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>.html &lt;span style="color:#87ceeb">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">&lt;/span> &amp;lt;(echo &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$inputtext&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">done&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># print a success message&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo -e &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">${&lt;/span>&lt;span style="color:#eedd82">SUCCESS&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">Build complete!&lt;/span>&lt;span style="color:#87ceeb">${&lt;/span>&lt;span style="color:#eedd82">NC&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>defhelp build &lt;span style="color:#87ceeb">&amp;#39;Build the site.&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>cmd_run() {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;Starting server...&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> npx nodemon --watch &lt;span style="color:#87ceeb">&amp;#39;content/**/*&amp;#39;&lt;/span> -e md,html,yaml --exec &lt;span style="color:#87ceeb">&amp;#39;./dev build&amp;#39;&lt;/span> &lt;span style="color:#87ceeb">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">&lt;/span> &amp;amp; caddy file-server --listen :8000 --root ./public &lt;span style="color:#87ceeb">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">&lt;/span> &amp;amp; wait
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># --------------------------------------------------------------------------&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># Core script logic&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># -----------------------------------------------------------------------------&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>silent() {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$@&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span> &amp;gt; /dev/null 2&amp;gt;&amp;amp;&lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># If no command given&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">if&lt;/span> [ &lt;span style="color:#eedd82">$#&lt;/span> -eq &lt;span style="color:#f60">0&lt;/span> ]; &lt;span style="color:#f00">then&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo -e &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">${&lt;/span>&lt;span style="color:#eedd82">ERROR&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">ERROR: This script requires a command!&lt;/span>&lt;span style="color:#87ceeb">${&lt;/span>&lt;span style="color:#eedd82">NC&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> cmd_help
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> exit &lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">fi&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#eedd82">cmd&lt;/span>=&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$1&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>shift
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">if&lt;/span> silent type &lt;span style="color:#87ceeb">&amp;#34;cmd_&lt;/span>&lt;span style="color:#eedd82">$cmd&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>; &lt;span style="color:#f00">then&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;cmd_&lt;/span>&lt;span style="color:#eedd82">$cmd&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span> &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#eedd82">$@&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> exit &lt;span style="color:#eedd82">$?&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">else&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo -e &lt;span style="color:#87ceeb">&amp;#34;&lt;/span>&lt;span style="color:#87ceeb">${&lt;/span>&lt;span style="color:#eedd82">ERROR&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">ERROR: Unknown command!&lt;/span>&lt;span style="color:#87ceeb">${&lt;/span>&lt;span style="color:#eedd82">NC&lt;/span>&lt;span style="color:#87ceeb">}&lt;/span>&lt;span style="color:#87ceeb">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#87ceeb">&amp;#34;Type &amp;#39;./dev help&amp;#39; for available commands.&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> exit &lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">fi&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>ECG beat detection algorithm</title><link>https://asifr.com/ecg-beat-detection/</link><pubDate>Wed, 13 Mar 2024 00:00:00 +0000</pubDate><guid>https://asifr.com/ecg-beat-detection/</guid><description>
&lt;p>A basic component of processing electrocardiogram (ECG) signals is detecting the heart beat. Beat detection is used to calculate the heart rate, to derive measures of heart rate variability, to develop signal quality indicators, and to detect diseases. There are thousands of publications and strategies to detecting the R-peak of the QRS complex of a heart beat from an ECG signal with varying degrees of accuracy (see section on &lt;a href="#other-beat-detection-algorithms">other beat detection algorithms&lt;/a> for a survey). Methods can range from threshold based peak detectors, to wavelet-based signal processing, to probabilistically combining multiple methods.&lt;/p>
&lt;p>We will use the Pan-Tompkins algorithm, one of the most widely implemented peak-detection algorithms, to detect the R-peak of the ECG signal in this chapter. Data for this exercise is from the 2017 Physionet Challenge which was aimed to classifying atrial fibrillation from single channel ECG signals. The data was sampled at 300 Hz and band pass filtered. First, we start with a short introduction to ECG wave analysis.&lt;/p>
&lt;h2 id="ecg-waves">ECG waves&lt;/h2>
&lt;p>ECG analysis starts with understanding the wave morphology and intervals.&lt;/p>
&lt;p>&lt;img src="https://asifr.com/images/ecg-morphology.png" alt="">&lt;/p>
&lt;p>Features derived from the a single beat in an ECG. Picture from &lt;a href="https://www.documents.philips.com/doclib/enc/fetch/2000/4504/577242/577243/577246/581601/711562/DXL_ECG_Algorithm_Physician_s_Guide_(ENG)_Ed.2.pdf">Philips DXL ECG Algorithm Physician Guide&lt;/a>&lt;/p>
&lt;p>The P-wave reflects atrial deploarization. The amplitude of the P-wave is decreases in diseases like atrial fibrillation, which is a type of arrythmia or abnormal heartbeat. Therefore, we typically want to quantify the amplitude and duration of the P-wave for AFib classification. The distance between the P-wave onset and onset of the QRS complex is the PR interval with a normal duration of 120-220 ms.&lt;/p>
&lt;p>The QRS complex reflects depolarization of the left ventricle (since the electrical vector of the left ventricle is much larger than that of the right ventricle). A short QRS duration proves the ventricles are functioning properly and a broad QRS duration indicates that ventricular activation is slow and there could be a dysfunction in the electrical conduction system of the heart. The R-peak of the QRS complex is used to calculate the instantaneous heart rate from the interval between subsequent R-peaks (RR-interval). An RR-interval of 400 ms is equivalent to an instantaneous heart rate of 150 beats per minute ($60 s / 400 ms * 1000 ms / s$).&lt;/p>
&lt;p>The ST segment is another important morphological feature of the ECG wave since ST elevation and depression are both associated with heart dysfunction like acute myocardial ischemia or ST-elevation myocardial infarction (STEMI). Elevation or depression are calculated as the difference (in millimeters) between the J point (where the ST segment starts) and the PR segment. Finally, the T-wave reflects a repolarization of the contractile cells and is also associated with a range o heart conditions.&lt;/p>
&lt;h2 id="pan-thompkins-algorithm">Pan-Thompkins algorithm&lt;/h2>
&lt;p>The Pan-Thompkins algorithm is widely used and can be used for real-time continuous QRS detection. The algorithm is based on analysis of slope, width, and amplitude of ECG using a series of filters. An ECG signal first goes through a bandpass filter, then a differentiator, a squaring operation, a moving window integrator, and finally adaptive thresholding and search-back to find the R-peak.&lt;/p>
&lt;p>&lt;img src="https://asifr.com/images/pan-thompkins.png" alt="">&lt;/p>
&lt;p>&lt;em>Pan-Thompkins algorithm for QRS detection.&lt;/em>&lt;/p>
&lt;p>Raw ECG signals include muscle noise (from respiration), motion artifacts, the QRS complex, and P-T Waves. The band pass filter is designed to match the spectrum of QRS complex, attenuates muscle noise, 60Hz interference, baseline wander, and T-wave interference. Pass band of 5-15 Hz maximizes the QRS energy.&lt;/p>
&lt;blockquote>
&lt;p>Filtering on waveforms can have negative effects. While low pass filters successfully reduce noise in ECG traces, they also reduce the QRS amplitude. High pass filters (e.g. low cutoff at 0.5 Hz) reduce baseline wander, but also introduce ST distortion. Using forward/backward filtering (highpass, reverse time, highpass, reverse time) removes most of the distortion introduced by high pass filters on the ST segment.&lt;/p>&lt;/blockquote>
&lt;p>Here we use cascading filters combining a low pass filter and a high pass filter to mimic a bandpass filter. The filter attenuates the P and T waves (which peak at &amp;lt;5 Hz), which is a desired feature since the goal is to detect the QRS complex.&lt;/p>
&lt;p>The filtered ECG is then differenced and squared to amplify the QRS complex. The derivative filter further suppresses low frequency components of P and T waves. Squaring makes the signal positive and enhances the derivatives by amplifying the high frequency QRS complex.&lt;/p>
&lt;p>Next, the moving average filter over a 150ms window captures the duration of the QRS complex and gives us the integrated signal. This suppresses the smaller oscillations by smoothing out the residual high frequency components. Here we have to define an optimal window length for the moving average. Large windows merge the QRS and T waves together and small windows would produce several peaks at the QRS complex making it difficult to find the R-peak. In addition to detecting the QRS complex the moving average filter gives us the width of the QRS complex.&lt;/p>
&lt;p>There are numerous heurestics for peak detection from the integrated signal (e.g. simple thresholding of the moving window integral). Pan-Thompkins proposes to use adpative thresholding and search-back to select a range of time values that correspond to QRS complexes by adapting to changes in ECG by computing running estimates of signal and noise peaks. Here instead I smooth the integrated signal with a gaussian filter to get the energy of the signal. The peak then corresponds to zero-crossings of the first difference where $x[i+1] &amp;lt; x[i]$.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> numpy &lt;span style="color:#f00">as&lt;/span> np
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> scipy.signal
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>ecg = np.loadtxt(&lt;span style="color:#87ceeb">&amp;#34;ecg.txt&amp;#34;&lt;/span>) &lt;span style="color:#0f0"># load the ECG signal&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>fs = &lt;span style="color:#f60">300&lt;/span> &lt;span style="color:#0f0"># Hz&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>tvec = np.arange(len(ecg)) / fs &lt;span style="color:#0f0"># time vector&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>max_QRS_duration = &lt;span style="color:#f60">0.150&lt;/span> &lt;span style="color:#0f0"># sec&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>low_cutoff = &lt;span style="color:#f60">5&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>high_cutoff = &lt;span style="color:#f60">15&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>window_size = int(max_QRS_duration * fs)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># apply a bandpass filter to the ECG signal&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>lowpass = scipy.signal.butter(&lt;span style="color:#f60">1&lt;/span>, high_cutoff / (fs / &lt;span style="color:#f60">2.0&lt;/span>), &lt;span style="color:#87ceeb">&amp;#34;low&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>highpass = scipy.signal.butter(&lt;span style="color:#f60">1&lt;/span>, low_cutoff / (fs / &lt;span style="color:#f60">2.0&lt;/span>), &lt;span style="color:#87ceeb">&amp;#34;high&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>ecg_low = scipy.signal.filtfilt(*lowpass, x=ecg)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>ecg_band = scipy.signal.filtfilt(*highpass, x=ecg_low)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>diff = np.diff(ecg_band)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>squared = np.square(diff)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># moving average filter&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># apply padding on both sides of the signal and convolve to get the integrated signal&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>mwa = np.pad(squared, (window_size - &lt;span style="color:#f60">1&lt;/span>, &lt;span style="color:#f60">0&lt;/span>), &lt;span style="color:#87ceeb">&amp;#34;constant&amp;#34;&lt;/span>, constant_values=(&lt;span style="color:#f60">0&lt;/span>, &lt;span style="color:#f60">0&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>mwa = np.convolve(mwa, np.ones(window_size), &lt;span style="color:#87ceeb">&amp;#34;valid&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">for&lt;/span> i in range(&lt;span style="color:#f60">1&lt;/span>, window_size):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> mwa[i - &lt;span style="color:#f60">1&lt;/span>] = mwa[i - &lt;span style="color:#f60">1&lt;/span>] / i
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>mwa[window_size - &lt;span style="color:#f60">1&lt;/span> :] = mwa[window_size - &lt;span style="color:#f60">1&lt;/span> :] / window_size
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>mwa[: int(max_QRS_duration * fs * &lt;span style="color:#f60">2&lt;/span>)] = &lt;span style="color:#f60">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># smooth the moving window integrated signal with a gaussian filter and take the derivative&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>energy = scipy.ndimage.gaussian_filter1d(mwa, fs / &lt;span style="color:#f60">8.0&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>energy_diff = np.diff(energy)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># peaks are the points where the derivative crosses zero, adjust window size&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>zero_crossings = (energy_diff[:-&lt;span style="color:#f60">1&lt;/span>] &amp;gt; &lt;span style="color:#f60">0&lt;/span>) &amp;amp; (energy_diff[&lt;span style="color:#f60">1&lt;/span>:] &amp;lt; &lt;span style="color:#f60">0&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>zero_crossings = np.flatnonzero(zero_crossings)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>zero_crossings -= int(window_size / &lt;span style="color:#f60">2&lt;/span>)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://asifr.com/images/ecg-beat-detections.png" alt="">&lt;/p>
&lt;p>&lt;em>Stages of the beat detection algorithm. The vertical red line indicates the peaks of the QRS complex.&lt;/em>&lt;/p>
&lt;p>The figure above visualizes each step of the beat detection algorith. The vertical red line are all the zero-crossings (&lt;code>zero_crossings&lt;/code>), which lines up with the peak of the ECG signal.&lt;/p>
&lt;h2 id="other-beat-detection-algorithms">Other beat detection algorithms&lt;/h2>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Algorithm&lt;/th>
&lt;th>Reference&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Zeelenberg (1979)&lt;/td>
&lt;td>Engelse, W.A.H., Zeelenberg, C (1979). A single scan algorithm for QRS detection and feature extraction, IEEE Comp. in Cardiology, vol. 6, pp. 37-42.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Pan (1985)&lt;/td>
&lt;td>Pan, J., &amp;amp; Tompkins, W. J. (1985). A real-time QRS detection algorithm. IEEE transactions on biomedical engineering, (3), 230-236.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Hamilton (2002)&lt;/td>
&lt;td>Hamilton, P. (2002, September). Open source ECG analysis. In Computers in cardiology (pp. 101-104). IEEE.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Zong (2003)&lt;/td>
&lt;td>Zong, W., Heldt, T., Moody, G. B., &amp;amp; Mark, R. G. (2003). An open-source algorithm to detect onset of arterial blood pressure pulses. In Computers in Cardiology, 2003 (pp. 259-262). IEEE.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Christov (2004)&lt;/td>
&lt;td>Ivaylo I. Christov, Real time electrocardiogram QRS detection using combined adaptive threshold, BioMedical Engineering OnLine 2004, vol. 3:28, 2004.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Elgendi (2010)&lt;/td>
&lt;td>Elgendi, Mohamed &amp;amp; Jonkman, Mirjam &amp;amp; De Boer, Friso. (2010). Frequency Bands Effects on QRS Detection. The 3rd International Conference on Bio-inspired Systems and Signal Processing (BIOSIGNALS2010). 428-431.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Kalidas (2017)&lt;/td>
&lt;td>Vignesh Kalidas and Lakshman Tamil (2017). Real-time QRS detector using Stationary Wavelet Transform for Automated ECG Analysis. In: 2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE). Uses the Pan and Tompkins thresolding.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Nabian (2018)&lt;/td>
&lt;td>Nabian, M., Yin, Y., Wormwood, J., Quigley, K. S., Barrett, L. F., Ostadabbas, S. (2018). An Open-Source Feature Extraction Tool for the Analysis of Peripheral Physiological Data. IEEE Journal of Translational Engineering in Health and Medicine, 6, 1-11.&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Rodrigues (2021)&lt;/td>
&lt;td>Rodrigues, Tiago &amp;amp; Samoutphonh, Sirisack &amp;amp; Plácido da Silva, Hugo &amp;amp; Fred, Ana. (2021). A Low-Complexity R-peak Detection Algorithm with Adaptive Thresholding for Wearable Devices.&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="additional-resources">Additional resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://ecgwaves.com/topic/ecg-normal-p-wave-qrs-complex-st-segment-t-wave-j-point/">Understanding ECG waves&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.documents.philips.com/doclib/enc/fetch/2000/4504/577242/577243/577246/581601/711562/DXL_ECG_Algorithm_Physician_s_Guide_(ENG)_Ed.2.pdf">Philips DXL ECG Algorithm Physician Guide&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Heart Rate Variability and Atrial Fibrillation</title><link>https://asifr.com/heart-rate-variability/</link><pubDate>Wed, 13 Mar 2024 00:00:00 +0000</pubDate><guid>https://asifr.com/heart-rate-variability/</guid><description>
&lt;p>Atrial fibrillation (AFib) is a sustained cardiac arrythmia and is classified according to the temporal pattern of irregularly spaced heart beats. Patients with AFib have cardiac hemodynamic dysfunction, have up to 2-fold increase in risk of mortality, and a 6-fold increase in risk of stroke. The electrocardiographic presentation of AFib is continuous and rapid irregular electrical activity of the atria and absence of the P-wave because ventricular response is poorly coupled with atrial activity. These hallmark characteristic of AFib make ECG monitoring the most convenient tool to assist AFib diagnosis. Automated algorithms rely on one or more characteristics of the waveform including irregular rhythm, high-frequency chaotic atrial waveform, and absence of P waves. Measures of heart rate variability (HRV) and morphological analysis are the most common approaches.&lt;/p>
&lt;h2 id="r-r-interval">R-R interval&lt;/h2>
&lt;p>We can use a &lt;a href="https://asifr.com/content/notes/ecg-beat-detection.md">beat detection&lt;/a> algorithm to find the R-peaks of the QRS complexes, which are recorded as indices $R_{peaks}[n]$ for the $n$-th beat. We can convert the indices to an R-R interval (RRI) in units of seconds by taking the first difference and dividing by the sampling rate $f_s$:&lt;/p>
&lt;p>$$RRI[n] = \frac{R_{peaks}[n] - R_{peaks}[n-1]}{f_s}$$&lt;/p>
&lt;h2 id="poincare-plots">Poincare plots&lt;/h2>
&lt;p>A Poincare plot is a diagram in which the RR-interval ($RRI[n]$) is plotted as a function of the previous RR-interval ($RRI[n-1]$) and is a visual representation of heart rate variability. Patients with AFib have irregular RRI and the dispersion is wider. As such, we can quantify this dispersion by fitting an ellipsis and measuring the standard deviation along the major and minor axes. The area of the fitted ellipsis is also larger in AFib patients.&lt;/p></description></item><item><title>Javascript custom events</title><link>https://asifr.com/javascript-custom-events/</link><pubDate>Mon, 08 Jan 2024 00:00:00 +0000</pubDate><guid>https://asifr.com/javascript-custom-events/</guid><description>
&lt;p>Custom events are a way to decouple a Javascript application. Trigger an event with the onclick handler (&lt;code>#!js onclick=&amp;quot;triggerEvent(this,'my-event')&amp;quot;&lt;/code>) of an element, and listen for it elsewhere in the application.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-javascript" data-lang="javascript">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">function&lt;/span> triggerEvent(el, name) {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> document.body.dispatchEvent(&lt;span style="color:#f00">new&lt;/span> CustomEvent(&lt;span style="color:#87ceeb">&amp;#39;app:&amp;#39;&lt;/span> + name, { detail: el }))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">function&lt;/span> main() {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> document.body.addEventListener(&lt;span style="color:#87ceeb">&amp;#39;app:my-event&amp;#39;&lt;/span>, &lt;span style="color:#f00">function&lt;/span> (event) {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> console.log(event.detail) &lt;span style="color:#0f0">// element that triggered the event
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> });
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>document.addEventListener(&lt;span style="color:#87ceeb">&amp;#39;DOMContentLoaded&amp;#39;&lt;/span>, main)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Audio biosignal processing of phonocardiograms</title><link>https://asifr.com/phonocardiograms/</link><pubDate>Fri, 01 Jul 2022 00:00:00 +0000</pubDate><guid>https://asifr.com/phonocardiograms/</guid><description>
&lt;p>A phonocardiogram (PCG) is a non-invasive assessment of the mechanical function of the heart. Cardiac auscultation and the analysis of the phonocardiogram can unveil fundamental clinical information regarding heart malfunctioning caused by congenital and acquired heart disease. This is achieved by detecting abnormal sound waves, or heart murmurs, in the PCG signal. This article uses data from the &lt;a href="https://moody-challenge.physionet.org/2022">2022 PhysioNet challenge&lt;/a> to explore the spectral properties of PCG signals.&lt;/p>
&lt;p>Below we see the time domain signal for a 5 sec window of PCG data. We can clearly see the S1 and S2 waves, which correspond to the beginning and end of the systolic phase of the heart beat, respectively.&lt;/p>
&lt;iframe src="https://asifr.com/images/pcg_figure1.html" width="100%" height="400" frameborder="0" allowfullscreen>&lt;/iframe>
&lt;p>The corresponding spectrogram shows the power at each frequency over time. The beats are clearly visible in both the time and frequency domains. There is some high frequency noise at about 1.5sec that we will next remove using a digital filter.&lt;/p>
&lt;p>&lt;img src="https://asifr.com/images/pcg_figure2.png" alt="">&lt;/p>
&lt;p>&lt;em>Spectrogram of the PCG signal&lt;/em>&lt;/p>
&lt;p>A Butterworth low-pass filter applied to this signal with 250 Hz cutoff frequency removes the high frequency noise at 1.5sec.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> scipy.signal
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">butter&lt;/span>(sig, freq, low_cutoff_hz, high_cutoff_hz, btype=&lt;span style="color:#87ceeb">&amp;#34;low&amp;#34;&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> b, a = scipy.signal.butter(low_cutoff_hz, high_cutoff_hz / (freq / &lt;span style="color:#f60">2.0&lt;/span>), btype)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> scipy.signal.filtfilt(b, a, sig)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;iframe src="https://asifr.com/images/pcg_figure3.html" width="100%" height="400" frameborder="0" allowfullscreen>&lt;/iframe>
&lt;p>Most of the signal of interest is below 500Hz so low pass filtering removes the high frequency components and leaves the cardiac cycles intact. The figure below shows both the low pass filtered signal in the time domain (left) and frequency domain (right) on top and the high pass filtered signal on the bottom.&lt;/p>
&lt;p>&lt;img src="https://asifr.com/images/pcg_figure4.png" alt="">&lt;/p>
&lt;p>&lt;em>Low and high passed signals&lt;/em>&lt;/p>
&lt;p>Zooming in shows the cardiac cycles in more detail.&lt;/p>
&lt;p>&lt;img src="https://asifr.com/images/pcg_figure5.png" alt="">&lt;/p>
&lt;p>&lt;em>Original, low, and high passed signals&lt;/em>&lt;/p></description></item><item><title>Time series similarity with random convolutional features and locality-sensitive hashing</title><link>https://asifr.com/neural-time-series-hash/</link><pubDate>Fri, 01 Jul 2022 00:00:00 +0000</pubDate><guid>https://asifr.com/neural-time-series-hash/</guid><description>
&lt;p>Given a time series, like temperature readings from collection of sensors, we want to find sensors that have similar readings. This is a common problem in applications like sensor networks, IoT, and monitoring systems.&lt;/p>
&lt;p>One approach is using random convolutional features to encode the signal and then use locality-sensitive hashing (LSH) to find similar signals. This approach is very fast and can be used to search through millions of time series signals. The time series neural hashing technique introduced here is a fast general-purpose search and retrieval algorithm.&lt;/p>
&lt;p>Our specifications are:&lt;/p>
&lt;ol>
&lt;li>We want to index a large number of time series signals and quickly retrieve similar time series in a way that scales computationally and has low storage requirements&lt;/li>
&lt;li>Signals are of variable length with stretches of missing values. Imputation of missing values is not feasible&lt;/li>
&lt;li>The representation should capture both the shape/structure and magnitude of the signal&lt;/li>
&lt;li>Minimal feature engineering&lt;/li>
&lt;/ol>
&lt;p>Random convolutional neural hashing&lt;/p>
&lt;p>Given a time series $X \in \mathbb{R}^{M \times N}$ of $M$ samples each of length $N$ containing missing values, we want to encode each sequence $x_m \in \mathbb{R}^{N}$ into a fixed length embedding vector that we can use for fast similarity search.&lt;/p>
&lt;ol>
&lt;li>Normalize and clip $x_m$ in the range [0,1].&lt;/li>
&lt;li>Fit a random convolutional encoder and save the parameters.&lt;/li>
&lt;li>Embed $x_m$ to $e_m \in \mathbb{R}^{k}$ using the convolutional encoder.&lt;/li>
&lt;li>Concatenate the sample stats $s_m={min, max, mean, P10, P25, P50, P90}$ from the normalized sequence to $e_m$ such that $[e_m; s_m] \in \mathbb{R}^{k+7}$&lt;/li>
&lt;li>Weight sample stats $s_m$ by $\alpha$ and convolutional encodings $e_m$ by $1-\alpha$. ($\alpha=0.75$)&lt;/li>
&lt;li>Define a seed matrix of shape $D \times k + 7$ from a standard normal distribution ($D=256$).&lt;/li>
&lt;li>Calculate the hash string for each sample (e.g &lt;code>e9df77eb7c692f16&lt;/code>)&lt;/li>
&lt;li>Retrieval: Given a query hash, compute the hamming distance to find the nearest neighbor.&lt;/li>
&lt;/ol>
&lt;p>The random convolutional encoder is very fast to fit and evaluate. It&amp;rsquo;s generally good at capturing the structure of the signal however the magnitude may be lost because we use a max pooling layer that introduces scale and shift invariance. However, our similarity search should consider magnitude, which is important for our specific use case. Therefore, we introduce simple statistics about the magnitude and distribution of sample observations.&lt;/p>
&lt;p>Pros:&lt;/p>
&lt;ol>
&lt;li>Captures both signal structure (e.g. periodicity, stretches of missing data) and magnitude.&lt;/li>
&lt;li>Fast and scalable both computationally and storage-wise.&lt;/li>
&lt;li>Works on time series with missing values and of variable length.&lt;/li>
&lt;/ol>
&lt;p>Cons:&lt;/p>
&lt;ol>
&lt;li>Sacrifices some precision since we are using random convolutional network where kernel weights, dilations, stide, are randomly selected.&lt;/li>
&lt;li>Hashing-based retrieval is an approximate nearest neighbor approach which also has lower precision compared to exact nearest neighbor search or similarity search using tree-based methods with a distance metric.&lt;/li>
&lt;/ol></description></item><item><title>Python project setup with Makefile and setup.py</title><link>https://asifr.com/python-project-setup/</link><pubDate>Tue, 08 Mar 2022 00:00:00 +0000</pubDate><guid>https://asifr.com/python-project-setup/</guid><description>
&lt;p>All my Python projects are setup in the same way - as a Python package with a Makefile. This is a template for that setup.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-console" data-lang="console">&lt;span style="display:flex;">&lt;span>$ tree .
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>.
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├── Makefile
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├── setup.py
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>├── myproject
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>│   ├── __init__.py
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;code>setup.py&lt;/code> file defines the requirements for the package, and the &lt;code>Makefile&lt;/code> defines the commands to run.&lt;/p>
&lt;p>The setup creates a virtual environment in &lt;code>.venv&lt;/code> directory, installs the package in editable mode, and installs the development dependencies. Editable installs are useful when developing a package, as changes to the code are immediately available without having to reinstall the package. If you add a new dependency to the &lt;code>setup.py&lt;/code> file, you will need to run &lt;code>make setup&lt;/code> again to install it.&lt;/p>
&lt;p>Here is an example &lt;code>setup.py&lt;/code> file:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> setuptools &lt;span style="color:#f00">import&lt;/span> setup
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>VERSION = &lt;span style="color:#87ceeb">&amp;#34;0.1.0&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>setup(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># package name, which can be different from project, this is the name used&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># when installing the package with pip, e.g. pip install mypackage&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> name=&lt;span style="color:#87ceeb">&amp;#34;mypackage&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> version=VERSION,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> maintainer=&lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> maintainer_email=&lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> description=&lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> license=&lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> python_requires=&lt;span style="color:#87ceeb">&amp;#34;&amp;gt;=3.10&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># static files to include in the package&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> package_data={
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;myproject&amp;#34;&lt;/span>: [
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;var/*&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># command line entry points&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> entry_points={
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;console_scripts&amp;#34;&lt;/span>: [
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;myproject = myproject.__main__:main&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># fol&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> packages=[&lt;span style="color:#87ceeb">&amp;#34;myproject&amp;#34;&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> install_requires=[
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;pip&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;pyarrow==13.0.0&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;polars==0.18.15&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;awscli==1.29.38&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;boto3==1.28.38&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;botocore==1.31.38&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;python-dotenv==1.0.0&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;uvicorn==0.24.0.post1&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;requests==2.31.0&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> extras_require={
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;dev&amp;#34;&lt;/span>: [
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;ruff&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;ipykernel&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;pytest&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> include_package_data=&lt;span style="color:#f00">True&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> zip_safe=&lt;span style="color:#f00">False&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> classifiers=[
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Intended Audience :: Science/Research&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Programming Language :: Python :: 3.10&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Operating System :: OS Independent&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> keywords=&lt;span style="color:#87ceeb">&amp;#34;python&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The &lt;code>Makefile&lt;/code> looks like this:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-make" data-lang="make">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#ff0">help&lt;/span>: &lt;span style="color:#0f0">## Show this help
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> @echo &lt;span style="color:#87ceeb">&amp;#34;\nSpecify a command. The choices are:\n&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> @grep -E &lt;span style="color:#87ceeb">&amp;#39;^[0-9a-zA-Z_-]+:.*?## .*$$&amp;#39;&lt;/span> &lt;span style="color:#f00">$(&lt;/span>MAKEFILE_LIST&lt;span style="color:#f00">)&lt;/span> | awk &lt;span style="color:#87ceeb">&amp;#39;BEGIN {FS = &amp;#34;:.*?## &amp;#34;}; {printf &amp;#34; \033[0;36m%-12s\033[m %s\n&amp;#34;, $$1, $$2}&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> @echo &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#ff0">.PHONY&lt;/span>: help
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#ff0">clean&lt;/span>: &lt;span style="color:#0f0">## Clean
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> rm -rf ./.venv
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rm -rf ./dist
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rm -rf ./mypackage.egg-info
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rm -rf ./mypackage/__pycache__
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rm -rf ./myproject/*.so
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rm -rf ./myproject/__pycache__/
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#ff0">.PHONY&lt;/span>: clean
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#ff0">setup&lt;/span>: &lt;span style="color:#0f0">## Editable install
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> test -d .venv || python3 -m venv .venv
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> . .venv/bin/activate; &lt;span style="color:#87ceeb">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">&lt;/span> python -m pip install --upgrade -i https://pypi.example.com/simple/ -e .[dev]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#ff0">.PHONY&lt;/span>: setup
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#ff0">server&lt;/span>: &lt;span style="color:#0f0">## Start local server
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> uvicorn myproject.app:app --reload --workers=&lt;span style="color:#f60">1&lt;/span> --reload-include=&lt;span style="color:#87ceeb">&amp;#34;./myproject*&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#ff0">.PHONY&lt;/span>: server
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Running &lt;code>make&lt;/code> or &lt;code>make help&lt;/code> will show the available commands:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>Specify a command. The choices are:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> help Show this help
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> setup Editable install
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> server Start local server
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Taxonomy of health data for machine learning</title><link>https://asifr.com/health-data-types/</link><pubDate>Wed, 23 Feb 2022 00:00:00 +0000</pubDate><guid>https://asifr.com/health-data-types/</guid><description>
&lt;p>&lt;img src="https://asifr.com/images/icu-datatypes.png" alt="">&lt;/p>
&lt;p>There is a wide variety of data types collected in the health system that can be utilized by machine learning models. These can include&lt;/p>
&lt;ol>
&lt;li>patient-level information like demographics and socio-economic factors&lt;/li>
&lt;li>hospital encounter-level information like admission source, ICU unit type, and discharge location&lt;/li>
&lt;li>outcomes including diagnoses like billing codes and patient outcomes&lt;/li>
&lt;li>interventions a patient received in the hospital like medications, invasive mechanical ventilation, oxygen support, pressors, fluids, blood transfusions, and ECMO&lt;/li>
&lt;li>findings from radiological images, pathology images, and video recordings&lt;/li>
&lt;li>laboratory measurements like blood gases, metabolic panels, liver panels, lipid panels, complete blood count, urinalysis, urine output, microbiology, and omics data&lt;/li>
&lt;li>continuous waveforms like ECG, PPG, PCG, ABP, and etCO2 signals&lt;/li>
&lt;li>nurse charted or automated vital sign collection including temperature, heart rate, blood pressure, and oxygen saturation&lt;/li>
&lt;li>clinican and radiological notes&lt;/li>
&lt;/ol>
&lt;p>In-patient Data collected in the hospital is linked to patients using a unique medical record number (MRN). Data collected in out-patient settings, including at home, a nursing home, or in ambulatory care may not also be linked to the patients MRN. Even inside the hospital, linking waveforms (especially in time) with patient data in electronic health records is a significant challenge.&lt;/p></description></item><item><title>Cosine similarity 1D convolutions</title><link>https://asifr.com/cosine-similarity-1d-convolution/</link><pubDate>Mon, 31 Jan 2022 00:00:00 +0000</pubDate><guid>https://asifr.com/cosine-similarity-1d-convolution/</guid><description>
&lt;p>The cosine similarity function below provides sign and scale independent 1D convolutions. It has a learnable parameter $p$ where large values of $p$ increase the sharpness of the cosine similarity. Here $u$ is the signal and $v$ is the kernel (e.g. $[1,2,3]$).&lt;/p>
&lt;p>$$\text{CosineSim}(u,v)=\text{sign}(u \cdot v)\text{abs}(\frac{u \cdot v}{|u|_2 \cdot |v|_2})^{p^2}$$&lt;/p>
&lt;p>We consider the kernel $[1,2,3]$ and a 1D time series signal with 5 distinct motifs:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>A&lt;/strong>. exact match $[1,2,3]$&lt;/li>
&lt;li>&lt;strong>B&lt;/strong>. negative sign, exact match $[-1,-2,-3]$&lt;/li>
&lt;li>&lt;strong>C&lt;/strong>. downscaled exact match $[0.2, 0.4, 0.6]$&lt;/li>
&lt;li>&lt;strong>D&lt;/strong>. median $[2,2,2]$&lt;/li>
&lt;li>&lt;strong>E&lt;/strong>. reversed $[3,2,1]$&lt;/li>
&lt;/ul>
&lt;p>We can compare a 1D convolution with &lt;code>kernel_size=3&lt;/code>, &lt;code>dilation = 1&lt;/code>, and &lt;code>padding = (kernel_size-1)*dilation = 2&lt;/code> against the cosine similarity distance.&lt;/p>
&lt;p>1D convolution correctly gives the largest activation to both of the exact matches (&lt;strong>A&lt;/strong> and &lt;strong>B&lt;/strong>). However, the convolution also gives a large activation to parts of the signal where there should not be a match. The median sequence of values &lt;strong>B&lt;/strong> $[2,2,2]$ and the reversed sequence &lt;strong>E&lt;/strong> $[3,2,1]$ get a significantly high activation despite having no similarity to the filter. The downscaled exact match &lt;strong>C&lt;/strong> is not selected by the convolution because of the scale of the filter. &lt;em>Standard convolutions on raw (unnormalized) data are not scale and sign independent.&lt;/em> Common normalizations, like batch or layer norm, calculates normalizing terms over samples or channels but not point-wise. The resulting convolved signal requires max pooling to find the subsequences of greatest correlation with the filter.&lt;/p>
&lt;p>In contrast, the cosine similarity gives a score of 1 or -1 only for exact matches. &lt;em>The feature is detected independent of sign or scale.&lt;/em> The figure below shows that the cosine similarity distance correctly detects the motifs &lt;strong>A&lt;/strong>, &lt;strong>B&lt;/strong>, and &lt;strong>C&lt;/strong>. If we set the sharpness parameter to an arbitratily large value $p=9$ then the only points are the exact matches.&lt;/p>
&lt;p>&lt;img src="images/convolution-cosine-similarity.png" alt="">&lt;/p>
&lt;p>The output of CosineSim is clearly interpreted as the points of maximal correlation between the signal susequence and the filter, where filters represent subsequence templates.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">cosine_similarity&lt;/span>(signal, kernel, sharpness, padding):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Compute the cosine similarity distance between a signal and a kernel.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Outputs a sequence of the same length as the signal.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Parameters
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ----------
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> signal : torch.Tensor
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> input [batch_size, channels, length]
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> kernel : torch.Tensor
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> filter [kernel_size]
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> sharpness : float
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> sharpness parameter
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> padding : int
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> padding size, (kernel size-1) * dilation
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> -------
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> torch.Tensor
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> output [batch_size, channels, length]
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kernel_size = kernel.size(-&lt;span style="color:#f60">1&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> x = F.pad(signal, (padding,padding))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> x = x.unfold(&lt;span style="color:#f60">2&lt;/span>, kernel_size, &lt;span style="color:#f60">1&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sim = F.cosine_similarity(x, kernel, dim=-&lt;span style="color:#f60">1&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sgn = torch.sign(torch.einsum(&lt;span style="color:#87ceeb">&amp;#39;bdij,k-&amp;gt;bdi&amp;#39;&lt;/span>, x, kernel))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sim = sgn * torch.pow(torch.abs(sim), (sharpness**&lt;span style="color:#f60">2&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sim = sim[:, :, : -padding].contiguous()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> sim
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Group-by and count in Numpy</title><link>https://asifr.com/groupby-count-numpy/</link><pubDate>Wed, 26 Jan 2022 00:00:00 +0000</pubDate><guid>https://asifr.com/groupby-count-numpy/</guid><description>
&lt;p>The &lt;code>crosstab&lt;/code> function takes a list of array-like objects and returns a contingency table of counts. A pure numpy implementation of a pivot-table like this is useful in environments where we don&amp;rsquo;t want to import the pandas package.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> typing &lt;span style="color:#f00">import&lt;/span> Tuple, List
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> numpy &lt;span style="color:#f00">as&lt;/span> np
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">crosstab&lt;/span>(*args) -&amp;gt; Tuple[Tuple[np.ndarray], np.ndarray]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Contingency table of counts.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Parameters
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ----------
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> args : list of array-like
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Arrays of discrete categorical data.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> -------
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> actual_levels : Tuple[np.ndarray]
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> The actual levels of the categorical variables.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> count : np.ndarray
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> The counts of the categorical variables cross-tabulated.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Examples
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> --------
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;gt;&amp;gt;&amp;gt; categorical = [1,3,2,3]
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;gt;&amp;gt;&amp;gt; covariate = [5,3,3,4]
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;gt;&amp;gt;&amp;gt; levels, count = crosstab(categorical, covariate)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> levels, indices = zip(*[np.unique(a, return_inverse=&lt;span style="color:#f00">True&lt;/span>) &lt;span style="color:#f00">for&lt;/span> a in args])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> count = np.zeros(list(map(len, levels)), dtype=int)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> np.add.at(count, indices, &lt;span style="color:#f60">1&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> levels, count
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Separable temporal convolutions</title><link>https://asifr.com/separable-temporal-convolutions/</link><pubDate>Sat, 22 Jan 2022 00:00:00 +0000</pubDate><guid>https://asifr.com/separable-temporal-convolutions/</guid><description>
&lt;p>Given a multivariate time series $x \in \mathbb{R}^{B \times D \times T}$ with $D=3$ channels, $T=4$ timesteps and batch size $B=1$.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>x = [
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> [&lt;span style="color:#f60">1&lt;/span>,&lt;span style="color:#f60">5&lt;/span>,&lt;span style="color:#f60">10&lt;/span>,&lt;span style="color:#f60">20&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> [&lt;span style="color:#f60">100&lt;/span>,&lt;span style="color:#f60">150&lt;/span>,&lt;span style="color:#f60">200&lt;/span>,&lt;span style="color:#f60">250&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> [&lt;span style="color:#f60">1000&lt;/span>,&lt;span style="color:#f60">1500&lt;/span>,&lt;span style="color:#f60">2000&lt;/span>,&lt;span style="color:#f60">2500&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># [batch_size=1, in_channels=3, timesteps=4]&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>xt = torch.FloatTensor(x)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>xt = xt.unsqueeze(&lt;span style="color:#f60">0&lt;/span>)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The separable convolution learns a group of filters for each channel independently, without interactions across channels. Below, I define a 1D convolutional layer with kernel size 2 and learn 1 filter per channel (&lt;code>num_channels=1&lt;/code>).&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>separable = &lt;span style="color:#f00">True&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>in_channels = xt.shape[&lt;span style="color:#f60">1&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>num_channels = &lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>kernel_size = &lt;span style="color:#f60">2&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>stride = &lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>layer_i = &lt;span style="color:#f60">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>dilation_size = &lt;span style="color:#f60">2&lt;/span> ** layer_i
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>padding = (kernel_size - &lt;span style="color:#f60">1&lt;/span>) * dilation_size
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>groups = in_channels &lt;span style="color:#f00">if&lt;/span> separable &lt;span style="color:#f00">else&lt;/span> &lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>out_channels = in_channels * num_channels
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>For illustrative purposes, the weights are initialized to 1 and bias to 0.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>conv1 = nn.Conv1d(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> in_channels,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> out_channels,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kernel_size,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> stride=stride,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> padding=padding,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dilation=dilation_size,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> groups=groups,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>torch.nn.init.constant_(conv1.weight, &lt;span style="color:#f60">1&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>torch.nn.init.constant_(conv1.bias, &lt;span style="color:#f60">0&lt;/span>)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>A separable convolution with &lt;code>kernel_size=2&lt;/code> and &lt;code>num_channels=1&lt;/code> is simply a weighted sum along each channel.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>tensor([[[ &lt;span style="color:#f60">1&lt;/span>, &lt;span style="color:#f60">6&lt;/span>, &lt;span style="color:#f60">15&lt;/span>, &lt;span style="color:#f60">30&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> [ &lt;span style="color:#f60">100&lt;/span>, &lt;span style="color:#f60">250&lt;/span>, &lt;span style="color:#f60">350&lt;/span>, &lt;span style="color:#f60">450&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> [&lt;span style="color:#f60">1000&lt;/span>, &lt;span style="color:#f60">2500&lt;/span>, &lt;span style="color:#f60">3500&lt;/span>, &lt;span style="color:#f60">4500&lt;/span>]]])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Increasing &lt;code>num_channels&lt;/code> will learn a set of independent filters for each channel. For example, with &lt;code>num_channels=3&lt;/code> gives a total number of output channels of &lt;code>num_channels*in_channels=9&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>tensor([[[ &lt;span style="color:#f60">1&lt;/span>, &lt;span style="color:#f60">6&lt;/span>, &lt;span style="color:#f60">15&lt;/span>, &lt;span style="color:#f60">30&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> [ &lt;span style="color:#f60">1&lt;/span>, &lt;span style="color:#f60">6&lt;/span>, &lt;span style="color:#f60">15&lt;/span>, &lt;span style="color:#f60">30&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> [ &lt;span style="color:#f60">1&lt;/span>, &lt;span style="color:#f60">6&lt;/span>, &lt;span style="color:#f60">15&lt;/span>, &lt;span style="color:#f60">30&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> [ &lt;span style="color:#f60">100&lt;/span>, &lt;span style="color:#f60">250&lt;/span>, &lt;span style="color:#f60">350&lt;/span>, &lt;span style="color:#f60">450&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> [ &lt;span style="color:#f60">100&lt;/span>, &lt;span style="color:#f60">250&lt;/span>, &lt;span style="color:#f60">350&lt;/span>, &lt;span style="color:#f60">450&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> [ &lt;span style="color:#f60">100&lt;/span>, &lt;span style="color:#f60">250&lt;/span>, &lt;span style="color:#f60">350&lt;/span>, &lt;span style="color:#f60">450&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> [&lt;span style="color:#f60">1000&lt;/span>, &lt;span style="color:#f60">2500&lt;/span>, &lt;span style="color:#f60">3500&lt;/span>, &lt;span style="color:#f60">4500&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> [&lt;span style="color:#f60">1000&lt;/span>, &lt;span style="color:#f60">2500&lt;/span>, &lt;span style="color:#f60">3500&lt;/span>, &lt;span style="color:#f60">4500&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> [&lt;span style="color:#f60">1000&lt;/span>, &lt;span style="color:#f60">2500&lt;/span>, &lt;span style="color:#f60">3500&lt;/span>, &lt;span style="color:#f60">4500&lt;/span>]]])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>When &lt;code>separable=False&lt;/code> and &lt;code>num_channels=1&lt;/code>, you get mixing between the channels:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>tensor([[[&lt;span style="color:#f60">1101&lt;/span>, &lt;span style="color:#f60">2756&lt;/span>, &lt;span style="color:#f60">3865&lt;/span>, &lt;span style="color:#f60">4980&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> [&lt;span style="color:#f60">1101&lt;/span>, &lt;span style="color:#f60">2756&lt;/span>, &lt;span style="color:#f60">3865&lt;/span>, &lt;span style="color:#f60">4980&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> [&lt;span style="color:#f60">1101&lt;/span>, &lt;span style="color:#f60">2756&lt;/span>, &lt;span style="color:#f60">3865&lt;/span>, &lt;span style="color:#f60">4980&lt;/span>]]])
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Challenges in Machine Learning for Health</title><link>https://asifr.com/challenges-in-ml-health/</link><pubDate>Wed, 01 Dec 2021 00:00:00 +0000</pubDate><guid>https://asifr.com/challenges-in-ml-health/</guid><description>
&lt;p>The secondary analysis of health data is challenging due to confounding, bias, uncertainty, and missingness.&lt;/p>
&lt;h2 id="health-data-is-irregularly-sampled-in-time">Health data is irregularly sampled in time&lt;/h2>
&lt;p>Some data types are more frequently acquired than others. Vitals and laboratory measurmeents are taken for most in-patients. Clinical notes are also usually available for most patients. Ventilator settings, ECG signals, invasive blood pressure measurements, imaging data, and genomics data are are more infrequent. In the intensive care unit (ICU), for instance, continuous signals from ECG and ABP are typically sampled at 125Hz to 500Hz. The raw ECG, ABP, PPG signal can be downsampled to &amp;ldquo;high-frequency numerics&amp;rdquo; at 1sec or 1min interval. Some EHR databases will further apply a median filter so the highest sampling rate of vital signs like heart rate, blood pressure, temperature, and oxygen saturation is 5min. Ventilation settings and arterial blood gases can be charted every 6-12hours. Other metabolic and liver panels are often ordered every 12-24hours. These signals are acquired asynchronously, at irregular time intervals, and at different sampling rates.&lt;/p>
&lt;p>&lt;img src="https://asifr.com/images/eri-measurement-rate.png" alt="">&lt;/p>
&lt;h2 id="adherence-to-a-standard-nomenclature">Adherence to a standard nomenclature&lt;/h2>
&lt;p>EHR databases may not follow a standard nomenclature (like SNOMED, LOINC, MDIL, etc&amp;hellip;), which introduces uncertainty in mapping measurements to standard concepts. OMOP is an example of an EHR database schema that attempts to heavily standardize concepts. However, the publicly accessible MIMIC and Philips eICU databases are not standardized and it is left up to the user to create concept mappings for medications, vital signs, laboratory measurements, and disease diagnoses. Many numeric fields like a laboratory measurement for creatinine can be entered as free text in the EHR user interface, which introduces further noise for the secondary analysis of EHR data.&lt;/p>
&lt;h2 id="critical-care-data-encodes-physiology-clinical-practice-patterns-and-clinicians-concern">Critical care data encodes physiology, clinical practice patterns, and clinicians concern&lt;/h2>
&lt;p>Electronic health records encode more than a patients physiology. The pattern of measurements, lab orders, and treatments capture the clinical decisions made at the bedside. The pattern of clinical decisions changes between institutions, between units in a hospital, the size of the institution (e.g. teaching vs community hospital). Disentangling physiology from clinical concern and care patterns is important to produce generalizable disease prediction models. However, this is not as important when using health data for operational research and to optimize hospital operations (e.g. forecasting bed occupancy).&lt;/p>
&lt;h2 id="mapping-patients-across-care-settings">Mapping patients across care settings&lt;/h2>
&lt;p>Connecting in-patient data collected in the hospital with out-patient data collected at home or in ambulatory care and emergency medical services is especially challenging. Even critical care data collected in the ICU can have periods of missing data where sensors like ECG, invasive ABP using an arterial line catheter, or PPG are missing for extended periods.&lt;/p>
&lt;h2 id="data-reliability">Data reliability&lt;/h2>
&lt;p>Nurse charted measurements can sometimes be unreliable. As an example, nurse charted respiratory rate at the bedside in the general ward is often rounded to a multiple of 5. Even continuously acquired signals can be unreliable. Timestamps between multiple sensors from different vendors (and sometimes the same vendor) can become desynchronized. The internal clock can drift. Some wearable devices may only save downsampled data instead of the raw signal.&lt;/p>
&lt;h2 id="dataset-and-concept-shift">Dataset and concept shift&lt;/h2>
&lt;p>Data drifts over time, institutions, hospital units, and countries. For example, ventilation settings and disease progression will change as more care givers adopt lung protective ventilation protocols; more patients have permissive hypertension and are on vasopressors in the neuro-ICU than in other units. Models that rely on interventions or physiological variables subject to dataset shift are susceptible to silent failure. Appropriate metrics to track dataset shift, model performance, and a pipeline to retrain and redeploy models are necessary.&lt;/p>
&lt;h2 id="generalizability-of-machine-learning-models">Generalizability of machine learning models&lt;/h2>
&lt;p>Algorithms developed for the general ward (GW) setting may not translate to higher acuity settings, and vice versa. Most patients in the GW are not on continuous monitoring and instead vitals are aperiodically nurse charted. Differences in the severity of illness and treatment patterns are also significantly different. Models should be carefully deployed in settings where they were designed and tested.&lt;/p>
&lt;h2 id="reproducibility">Reproducibility&lt;/h2>
&lt;p>Reproducibility of the data extraction pipeline is important for documentation and verification. This becomes challenging for a large team working on different parts of the data extraction pipeline with tasks like concept mapping, data labeling, cohort selection, and data pre-processing divided among scientists. Start by coordinating on data storage, naming conventions (for code and datasets), and tooling. I have found that documenting the pipeline in single function call helps with posterity and reproducibility.&lt;/p>
&lt;h2 id="actionability">Actionability&lt;/h2>
&lt;p>Algorithms that are actionable are tied to a protocol or clinical decision support system that is integrated into the clinical workflow. For example, predicting the that a patient is at high risk of hemodynamic shock on it&amp;rsquo;s own is not useful unless it is tied to a protocol that triggers an alert and a clinical decision support system that recommends appropriate fluids or pressors.&lt;/p></description></item><item><title>Quantile binning with missing data</title><link>https://asifr.com/binning/</link><pubDate>Sun, 03 Oct 2021 00:00:00 +0000</pubDate><guid>https://asifr.com/binning/</guid><description>
&lt;p>This uses Numpy and Numba for fast binning of numerical data to quantiles. It also supports missing data.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> numpy &lt;span style="color:#f00">as&lt;/span> np
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> numba &lt;span style="color:#f00">import&lt;/span> njit
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_find_binning_thresholds&lt;/span>(data, max_bins=&lt;span style="color:#f60">256&lt;/span>, subsample=int(&lt;span style="color:#f60">2e5&lt;/span>)):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> not (&lt;span style="color:#f60">2&lt;/span> &amp;lt;= max_bins &amp;lt;= &lt;span style="color:#f60">256&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">raise&lt;/span> ValueError(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;max_bins=&lt;/span>&lt;span style="color:#87ceeb">{}&lt;/span>&lt;span style="color:#87ceeb"> should be no smaller than 2 &amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;and no larger than 256.&amp;#34;&lt;/span>.format(max_bins)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> percentiles = np.linspace(&lt;span style="color:#f60">0&lt;/span>, &lt;span style="color:#f60">100&lt;/span>, num=max_bins + &lt;span style="color:#f60">1&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> percentiles = percentiles[&lt;span style="color:#f60">1&lt;/span>:-&lt;span style="color:#f60">1&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> binning_thresholds = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> f_idx in range(data.shape[&lt;span style="color:#f60">1&lt;/span>]):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> col_data = np.ascontiguousarray(data[:, f_idx], dtype=np.float64)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> mask = np.isfinite(col_data)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> col_data = col_data[mask]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> distinct_values = np.unique(col_data)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> len(distinct_values) &amp;lt;= max_bins:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> midpoints = distinct_values[:-&lt;span style="color:#f60">1&lt;/span>] + distinct_values[&lt;span style="color:#f60">1&lt;/span>:]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> midpoints *= &lt;span style="color:#f60">0.5&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> midpoints = np.percentile(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> col_data, percentiles, interpolation=&lt;span style="color:#87ceeb">&amp;#34;midpoint&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ).astype(np.float64)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> binning_thresholds.append(np.unique(midpoints))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> binning_thresholds
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@njit()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_map_num_col_to_bins&lt;/span>(data, binning_thresholds, binned):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i in range(data.shape[&lt;span style="color:#f60">0&lt;/span>]):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> left, right = &lt;span style="color:#f60">0&lt;/span>, binning_thresholds.shape[&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">while&lt;/span> left &amp;lt; right:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> middle = (right + left - &lt;span style="color:#f60">1&lt;/span>) // &lt;span style="color:#f60">2&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> data[i] &amp;lt;= binning_thresholds[middle]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> right = middle
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> left = middle + &lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> binned[i] = left
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_map_to_bins&lt;/span>(data, binning_thresholds, binned):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Bin numerical values to discrete integer-coded levels.&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> feature_idx in range(data.shape[&lt;span style="color:#f60">1&lt;/span>]):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> _map_num_col_to_bins(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> data[:, feature_idx],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> binning_thresholds[feature_idx],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> binned[:, feature_idx],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">_assign_nan_to_bin&lt;/span>(binned, X, actual_n_bins, assign_nan_to_unique_bin=&lt;span style="color:#f00">False&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> mask = np.isnan(X)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i in range(X.shape[&lt;span style="color:#f60">1&lt;/span>]):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> binned[mask[:, i], i] = actual_n_bins[i] &lt;span style="color:#f00">if&lt;/span> assign_nan_to_unique_bin &lt;span style="color:#f00">else&lt;/span> np.nan
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> binned
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">class&lt;/span> QuantileBinning():
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">__init__&lt;/span>(self):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.bin_thresholds = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.n_bins = []
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">fit&lt;/span>(self, X):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.bin_thresholds = _find_binning_thresholds(X)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> self.n_bins = np.array(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> [thresholds.shape[&lt;span style="color:#f60">0&lt;/span>] + &lt;span style="color:#f60">1&lt;/span> &lt;span style="color:#f00">for&lt;/span> thresholds in self.bin_thresholds], dtype=np.uint32
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> )
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">transform&lt;/span>(self, X, assign_nan_to_unique_bin=&lt;span style="color:#f00">False&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> binned = np.zeros_like(X, dtype=np.float32, order=&lt;span style="color:#87ceeb">&amp;#34;F&amp;#34;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> _map_to_bins(X, self.bin_thresholds, binned)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> binned = _assign_nan_to_bin(binned, X, self.n_bins, assign_nan_to_unique_bin)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> binned
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Ensemble decision trees in Numba</title><link>https://asifr.com/ensemble-decision-tree-numba/</link><pubDate>Tue, 21 Sep 2021 00:00:00 +0000</pubDate><guid>https://asifr.com/ensemble-decision-tree-numba/</guid><description>
&lt;p>Representing ensembles of decision trees using numpy arrays with fast numba operations.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> math
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> numpy &lt;span style="color:#f00">as&lt;/span> np
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> numba &lt;span style="color:#f00">import&lt;/span> njit, prange
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@njit
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">take&lt;/span>(X, inds):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Multidimensional indexing for numba&amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> n = len(X)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> y = np.zeros(n, dtype=X.dtype)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i in range(n):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> y[i] = X[i, inds[i]]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> y
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@njit
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">next_node&lt;/span>(node_id, value, thr):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;A vectorized operation to find the next node in a binary tree given a
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> value and threshold.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> node_id: int or array of the current node, root node_id = 0
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Return:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> new node id, left node when value &amp;lt;= thr, right node when value &amp;gt; thr
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> (node_id &amp;lt;&amp;lt; &lt;span style="color:#f60">1&lt;/span>) + &lt;span style="color:#f60">1&lt;/span> + (&lt;span style="color:#f60">1&lt;/span> * (thr &amp;lt; value))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@njit
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">leaf&lt;/span>(X, features, thresholds, reset_leaf_index=&lt;span style="color:#f60">1&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Find the leaf node index along a decision path given a tree feature
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> indices, thresholds, and design matrix.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> X: 2D design matrix of shape [nsamples, nfeatures]
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> features: feature indices as dtype np.int32 of shape [internal_nodes]
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> thresholds: split thresholds as dtype np.float64 of shape [internal_nodes]
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> reset_leaf_index: returns leaf index initialized from 1
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> nsamples = len(X)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> node_id = np.zeros(nsamples, dtype=np.int64)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> n = len(features)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> depth = int(math.log(n+&lt;span style="color:#f60">1&lt;/span>)/math.log(&lt;span style="color:#f60">2&lt;/span>))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> internal_nodes = &lt;span style="color:#f60">2&lt;/span>**(depth) - &lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i in range(depth):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> feature_ind = features[node_id]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> value = take(X, feature_ind)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> thr = thresholds[node_id]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> node_id = next_node(node_id, value, thr)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> reset_leaf_index == &lt;span style="color:#f60">1&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> node_id = node_id - internal_nodes
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> node_id
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@njit
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">leaf_tokens&lt;/span>(X, trees, nleaves_per_tree):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Tokenize a design matrix with leaf indices.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> X: 2D design matrix of shape [nsamples, nfeatures]
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> trees: 3D matrix deifining an ensemble of decision trees of shape
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> [ntrees, internal_nodes, 2]
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> nleaves_per_tree: number of leaves in each decision tree, 2**depth
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> nsamples = X.shape[&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ntrees = trees.shape[&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> leaves = np.zeros((nsamples,ntrees), dtype=np.int64)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i in prange(&lt;span style="color:#f60">0&lt;/span>,ntrees):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> features = trees[i,:,&lt;span style="color:#f60">0&lt;/span>].astype(np.int64)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> thresholds = trees[i,:,&lt;span style="color:#f60">1&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> leaves[:,i] = leaf(X, features, thresholds) + nleaves_per_tree * i
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> leaves
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@njit
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">random_ensemble_decision_trees&lt;/span>(ntrees, depth, nfeatures):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Generate a random ensemble of decision trees
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> ntrees: number of trees in ensemble
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> depth: height of each tree
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> nfeatures: number of features in design matrix
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> trees: 3D matrix of shape [ntrees, internal_nodes, 2] where internal_nodes
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> is the number of non-leaf nodes (including root node) calculated
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> as 2^&lt;/span>&lt;span style="color:#87ceeb">{depth}&lt;/span>&lt;span style="color:#87ceeb"> - 1 and the last dimension includes feature index and
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> splitting thresholds
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> internal_nodes = &lt;span style="color:#f60">2&lt;/span>**(depth) - &lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total_internal_nodes = ntrees * internal_nodes
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> trees = np.zeros((total_internal_nodes, &lt;span style="color:#f60">2&lt;/span>)) &lt;span style="color:#0f0"># [features, thresholds]&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> trees[:,&lt;span style="color:#f60">0&lt;/span>] = np.random.randint(&lt;span style="color:#f60">0&lt;/span>, nfeatures, total_internal_nodes)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> trees[:,&lt;span style="color:#f60">1&lt;/span>] = np.random.rand(total_internal_nodes)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> trees = trees.reshape(ntrees, internal_nodes, -&lt;span style="color:#f60">1&lt;/span>) &lt;span style="color:#0f0"># [ntrees, internal_nodes, 2]&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> trees
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>nsamples, nfeatures = &lt;span style="color:#f60">1000&lt;/span>, &lt;span style="color:#f60">30&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>X = np.random.rand(nsamples,nfeatures)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>ntrees = &lt;span style="color:#f60">1000&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>depth = &lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>nleaves = &lt;span style="color:#f60">2&lt;/span>**depth
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>total_nleaves = ntrees * nleaves
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>trees = random_ensemble_decision_trees(ntrees=ntrees, depth=depth, nfeatures=nfeatures)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>leaves = leaf_tokens(X, trees, nleaves)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Feature engineering for time series data using Numba</title><link>https://asifr.com/feature-engineering-time-series-numba/</link><pubDate>Tue, 07 Sep 2021 00:00:00 +0000</pubDate><guid>https://asifr.com/feature-engineering-time-series-numba/</guid><description>
&lt;div style="background:#FFF;text-align:center;">
&lt;p>&lt;img src="https://asifr.com/images/feature-engineering.svg" alt="">&lt;/p>
&lt;/div>
&lt;p>&lt;em>Feature engineering over a multivariate time series with missing data.&lt;/em>&lt;/p>
&lt;p>Given a sequence of measurements &lt;code>values: np.ndarray&lt;/code> and observation times &lt;code>times: np.ndarray&lt;/code>, we want to engineer features for a machine learning model that captures the temporal trends and statistics over different temporal windows. Our data is irregularly sampled and have missing data.&lt;/p>
&lt;p>Features representing the magnitude, dispersion, direction of change, and temporal trends are derived. Every time point is represented with 4 categories of features:&lt;/p>
&lt;ul>
&lt;li>Magnitude of the most recent observation within the last 6h for vital signs and 24h for laboratory measurements.&lt;/li>
&lt;li>Dispersion measured as the range over a short and long-time window.&lt;/li>
&lt;li>Direction of change (increasing, decreasing, no change) over a short and long-time window.&lt;/li>
&lt;li>Exponential moving averages (EMA) with varying decay rates that specify how the much impact each past observation has on the current mean. EMA features were calculated on the forward filled magnitudes and using an EMA algorithm specifically for irregularly sampled time series.&lt;/li>
&lt;/ul>
&lt;p>The engineered features include:&lt;/p>
&lt;ol>
&lt;li>&lt;code>dt&lt;/code>: time elapsed since the measurement was made&lt;/li>
&lt;li>&lt;code>val&lt;/code>: most recent measurement, measurements are forward filled up to a maximum duration after&lt;/li>
&lt;li>&lt;code>srng&lt;/code>: range as $\frac{x_{max}-x_{min}}{x_{max}+x_{min}}100$ over a &lt;em>short window&lt;/em>&lt;/li>
&lt;li>&lt;code>ssgn&lt;/code>: sign of the change (-1 or +1) between the first and last measurement in a &lt;em>short window&lt;/em>, windows without measurements are filled with 0&lt;/li>
&lt;li>&lt;code>lrng&lt;/code>: range as (max-min)/(max+min) * 100 over a &lt;em>long window&lt;/em>&lt;/li>
&lt;li>&lt;code>lsgn&lt;/code>: sign of the change (-1 or +1) between the first and last measurement in a &lt;em>long window&lt;/em>, windows without measurements are filled with 0&lt;/li>
&lt;li>&lt;code>sema&lt;/code>: slow exponential moving average, calculated after forward filling&lt;/li>
&lt;li>&lt;code>fema&lt;/code>: fast exponential moving average, calculated after forward filling&lt;/li>
&lt;/ol>
&lt;p>We also need to define the following settings for every variable:&lt;/p>
&lt;ol>
&lt;li>&lt;code>forward_fill_duration&lt;/code>: duration in minutes to forward fill a missing variable&lt;/li>
&lt;li>&lt;code>short_rolling_window_size&lt;/code>: short rolling window size in minutes&lt;/li>
&lt;li>&lt;code>long_rolling_window_size&lt;/code>: long rolling window size in minutes&lt;/li>
&lt;li>&lt;code>slow_ema_tau&lt;/code>: decay rate for slow exponential moving average in minutes&lt;/li>
&lt;li>&lt;code>fast_ema_tau&lt;/code>: decay rate for fast exponential moving average in minutes&lt;/li>
&lt;/ol>
&lt;p>Some derived features have no missing data, including &lt;code>dt&lt;/code>, &lt;code>ssgn&lt;/code>, &lt;code>srng&lt;/code>, &lt;code>lsgn&lt;/code>, and &lt;code>lrng&lt;/code>. Other variables like &lt;code>val&lt;/code>, &lt;code>sema&lt;/code>, &lt;code>fema&lt;/code> have missing values encoded as NaN. Missing data is forward filled up to &lt;code>forward_fill_duration&lt;/code> and all missing values after the forward filling time are NaN.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> numba &lt;span style="color:#f00">import&lt;/span> njit, prange
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@njit()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">ema&lt;/span>(times, values, tau):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Exponential moving average for irregularly sampled time series.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Units for ``times`` and ``tau`` should be in hours.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> times (np.ndarray): 1D array of measurement times in hours
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> values (np.ndarray): 1D array of measurements
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> tau (float): time decay in hours
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> n = len(values)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ret = np.empty(n, dtype=np.float64) * np.nan
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ret[&lt;span style="color:#f60">0&lt;/span>] = values[&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> last_i = &lt;span style="color:#f60">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> i in range(&lt;span style="color:#f60">1&lt;/span>, n):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> np.isnan(times[i]) | np.isnan(values[i]):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">continue&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> alpha = (times[i] - times[last_i]) / tau
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> w = np.exp(-alpha)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> alpha &amp;gt; &lt;span style="color:#f60">1e-6&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> w2 = (&lt;span style="color:#f60">1&lt;/span> - w) / alpha
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># use Taylor expansion for numerical stability&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> w2 = &lt;span style="color:#f60">1&lt;/span> - (alpha / &lt;span style="color:#f60">2&lt;/span>) + (alpha * alpha / &lt;span style="color:#f60">6&lt;/span>) - (alpha * alpha * alpha / &lt;span style="color:#f60">24&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ret[i] = (ret[last_i] * w) + (values[i] * (&lt;span style="color:#f60">1&lt;/span> - w2)) + (values[last_i] * (w2 - w))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> last_i = i
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> ret
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@njit()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">ffill&lt;/span>(times, values):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Forward fill an array of values and times.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Times indicate the time of last observation.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> arr (np.ndarray): array of values with nan
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> times (np.ndarray): array of times
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> 2D array with forward filled times (index 0)
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> and values (index 1).
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> out_values = values.copy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> out_times = times.copy()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> n = values.shape[&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> out = np.zeros((n, &lt;span style="color:#f60">2&lt;/span>), dtype=np.float64)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> lastval = np.nan
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> lasttime = np.nan
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> row_idx in range(n):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> np.isfinite(values[row_idx]):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> lastval = values[row_idx]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> lasttime = times[row_idx]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> out_values[row_idx] = lastval
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> out_times[row_idx] = lasttime
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> out[:, &lt;span style="color:#f60">0&lt;/span>] = out_times
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> out[:, &lt;span style="color:#f60">1&lt;/span>] = out_values
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> out
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@njit()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">sliding_windows&lt;/span>(times, window_size):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> tdiff = times - times.reshape(-&lt;span style="color:#f60">1&lt;/span>, &lt;span style="color:#f60">1&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> twindows = (tdiff &amp;gt;= &lt;span style="color:#f60">0&lt;/span>) &amp;amp; (tdiff &amp;lt;= window_size)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> twindows
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>@njit()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">feature_engineering&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> times,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> values,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> forward_fill_duration,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> short_rolling_window_size,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> long_rolling_window_size,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> slow_ema_tau,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> fast_ema_tau,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Feature engineering.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> 1. Age of the observation
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> 2. Most recent forward-filled measurement
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> 3. Dispersion as (max-min)/(max+min) over short window
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> 4. Sign of change between first and last value in short window
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> 5. Dispersion as (max-min)/(max+min) over long window
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> 6. Sign of change between first and last value in long window
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> 7. slow EMA
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> 8. fast EMA
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Forward filling replaces nan values up to a maximum duration given by
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> `forward_fill_duration`. Any remaining missing values should be imputed
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> with the median or by sampling from a standard reference range.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> values (np.ndarry): 2D array of observations
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> times (np.ndarry): 1D array of observation times
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> forward_fill_duration (float): duration to forward fill missing values
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> short_rolling_window_size (float): observation duration for rolling
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> statistics with a short lookback window
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> long_rolling_window_size (float): observation duration for rolling
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> statistics with a long lookback window
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> slow_ema_tau (float): slow decay rate weights more of the past
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> fast_ema_tau (float): fast decay rate weights more of the present
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> 2D matrix with shape [n_samples, 8], where the features are:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> suffixes = [&amp;#39;dt&amp;#39;, &amp;#39;val&amp;#39;, &amp;#39;short_rng&amp;#39;, &amp;#39;short_sgn&amp;#39;, &amp;#39;long_rng&amp;#39;, &amp;#39;long_sgn&amp;#39;, &amp;#39;slow_ema&amp;#39;, &amp;#39;fast_ema&amp;#39;]
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> DT_IX = &lt;span style="color:#f60">0&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> VAL_IX = &lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> SRNG_IX = &lt;span style="color:#f60">2&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> SSGN_IX = &lt;span style="color:#f60">3&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> LRNG_IX = &lt;span style="color:#f60">4&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> LSGN_IX = &lt;span style="color:#f60">5&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> SEMA_IX = &lt;span style="color:#f60">6&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> FEMA_IX = &lt;span style="color:#f60">7&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> n = values.shape[&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived = np.zeros((n, &lt;span style="color:#f60">8&lt;/span>)) * np.nan
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> missing = np.isnan(values)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># return a median imputed array if there are no observations&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> np.all(missing):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[:, SSGN_IX] = &lt;span style="color:#f60">0.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[:, LSGN_IX] = &lt;span style="color:#f60">0.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[:, SRNG_IX] = &lt;span style="color:#f60">0.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[:, LRNG_IX] = &lt;span style="color:#f60">0.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> derived
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># indicate samples within a fixed window for each sample&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> short_windows = sliding_windows(times, short_rolling_window_size)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> short_windows = short_windows &amp;amp; ~missing.reshape(-&lt;span style="color:#f60">1&lt;/span>,&lt;span style="color:#f60">1&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> long_windows = sliding_windows(times, long_rolling_window_size)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> long_windows = long_windows &amp;amp; ~missing.reshape(-&lt;span style="color:#f60">1&lt;/span>,&lt;span style="color:#f60">1&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># forward fill&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> xfill = ffill(times, values)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dt = times - xfill[:,&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># 0. Age of the measurement&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[:, DT_IX] = dt
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># 1. last measured value&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[:, VAL_IX] = xfill[:, &lt;span style="color:#f60">1&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> expired = ~np.isnan(dt)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> expired[expired] = dt[expired] &amp;gt;= forward_fill_duration
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[expired, VAL_IX] = np.nan
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># rolling statistics&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> t in range(n):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># samples in this window&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> short_inds = short_windows[:, t]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> long_inds = long_windows[:, t]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> short_vals = values[short_inds]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> long_vals = values[long_inds]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> short_vals.size &amp;gt; &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># 2. rolling dispersion&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> high = np.max(short_vals)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> low = np.min(short_vals)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total = high + low
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> total &amp;gt; &lt;span style="color:#f60">1e-6&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[t, SRNG_IX] = (high - low) / (total + &lt;span style="color:#f60">1e-6&lt;/span>) * &lt;span style="color:#f60">100&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[t, SRNG_IX] = &lt;span style="color:#f60">0.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># 3. sign of change between last and first value&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> short_vals.size &amp;gt;= &lt;span style="color:#f60">2&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[t, SSGN_IX] = np.sign(short_vals[-&lt;span style="color:#f60">1&lt;/span>] - short_vals[&lt;span style="color:#f60">0&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[t, SSGN_IX] = &lt;span style="color:#f60">0.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[t,SRNG_IX] = &lt;span style="color:#f60">0.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[t,SSGN_IX] = &lt;span style="color:#f60">0.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> long_vals.size &amp;gt; &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># 4. rolling dispersion&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> high = np.max(long_vals)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> low = np.min(long_vals)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> total = high + low
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> total &amp;gt; &lt;span style="color:#f60">1e-6&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[t, LRNG_IX] = (high - low) / (total + &lt;span style="color:#f60">1e-6&lt;/span>) * &lt;span style="color:#f60">100&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[t, LRNG_IX] = &lt;span style="color:#f60">0.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># 5. sign of change between last and first value&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> long_vals.size &amp;gt;= &lt;span style="color:#f60">2&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[t, LSGN_IX] = np.sign(long_vals[-&lt;span style="color:#f60">1&lt;/span>] - long_vals[&lt;span style="color:#f60">0&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[t, LSGN_IX] = &lt;span style="color:#f60">0.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[t, LRNG_IX] = &lt;span style="color:#f60">0.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[t, LSGN_IX] = &lt;span style="color:#f60">0.&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># exponential moving average using the forward filled data&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ema_slow = np.empty(n, dtype=np.float64) * np.nan
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ema_fast = np.empty(n, dtype=np.float64) * np.nan
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> m = np.isfinite(derived[:, &lt;span style="color:#f60">1&lt;/span>])
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> vals = derived[m, &lt;span style="color:#f60">1&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> vals.size &amp;gt; &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ema_slow = ema(times[m], derived[m, &lt;span style="color:#f60">1&lt;/span>], slow_ema_tau)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ema_fast = ema(times[m], derived[m, &lt;span style="color:#f60">1&lt;/span>], fast_ema_tau)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># 6. slow EMA&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[m, SEMA_IX] = ema_slow
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># 7. fast EMA&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> derived[m, FEMA_IX] = ema_fast
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> derived
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Select a random window of maximum duration in NumPy</title><link>https://asifr.com/random-window-numpy/</link><pubDate>Mon, 02 Aug 2021 00:00:00 +0000</pubDate><guid>https://asifr.com/random-window-numpy/</guid><description>
&lt;p>Given a sequence of indices $x$ and times $t$, we want to select a window
of data from $x$ with a maximum duration and a maximum number of measurements.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">select_random_window&lt;/span>(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> x: np.ndarray, t: np.ndarray, max_window_size: int, dur: float
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Select a random window of consecutive elements with a maximum duration.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> This function has extra logic that ensures the selected windows have length
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> max_window_size whenever possible.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> x (np.ndarray): sequence of indices
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> max_window_size (int): maximum number of observations
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb">
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> 1D array of samples in the observation window.
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> L = len(x)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> L &amp;lt;= max_window_size:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> x
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> (t[-&lt;span style="color:#f60">1&lt;/span>] - t[&lt;span style="color:#f60">0&lt;/span>]) &amp;lt;= dur:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> x
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> L_tmp = L
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> L &amp;lt; max_window_size - &lt;span style="color:#f60">6&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> L_tmp = L - int(max_window_size / &lt;span style="color:#f60">2&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">else&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> L_tmp = L - &lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> start = random.randint(&lt;span style="color:#f60">0&lt;/span>, L_tmp)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t_start = t[start]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> t_end = t_start + dur
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> t_end &amp;gt; t[-&lt;span style="color:#f60">1&lt;/span>]:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> x[start:]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> end = np.argwhere((t - t_end) &amp;gt;= &lt;span style="color:#f60">0&lt;/span>)[&lt;span style="color:#f60">0&lt;/span>][&lt;span style="color:#f60">0&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> start = start &lt;span style="color:#f00">if&lt;/span> end &amp;lt;= L &lt;span style="color:#f00">else&lt;/span> max(&lt;span style="color:#f60">0&lt;/span>, start - (end - L))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> x[start:end]
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Causal inference learners</title><link>https://asifr.com/causal-inference-learners/</link><pubDate>Mon, 01 Feb 2021 00:00:00 +0000</pubDate><guid>https://asifr.com/causal-inference-learners/</guid><description>
&lt;p>Given covariates $X$, treatment indicator $W$ and a binary outcome $Y \in {0,1}$.&lt;/p>
&lt;h2 id="inverse-probability-of-treatment-weights">Inverse probability of treatment weights&lt;/h2>
&lt;p>Fit a model on observations $X$ to predict the treatment $W$, $p_{w_i}(x_i)=P(W=w_i|X=x_i)$ and use the probability of being treated as a sample weight to predict the outcome $Y$.&lt;/p>
&lt;p>$$IPTW_i=\frac{1}{p_{w_i}}$$&lt;/p>
&lt;h2 id="s-learner">S-learner&lt;/h2>
&lt;p>A single model is trained on observations and treatments.&lt;/p>
&lt;p>$$\begin{aligned}
\hat{\mu} = M(Y \sim (X,W))\\
\hat{\tau}(x)=\hat{\mu}(x,1) - \hat{\mu}(x,0)
\end{aligned}$$&lt;/p>
&lt;h2 id="t-learner">T-learner&lt;/h2>
&lt;p>Two models, one model per treatment, are trained on the observations only.&lt;/p>
&lt;p>$$\begin{aligned}
\hat{\mu}_0 = M_0(Y^0 \sim X^0)\\
\hat{\mu}_1 = M_1(Y^1 \sim X^1)\\
\hat{\tau}(x)=\hat{\mu}_1(x)-\hat{\mu}_0(x)
\end{aligned}$$&lt;/p>
&lt;h2 id="x-learner">X-learner&lt;/h2>
&lt;p>The X-learner first estimates the response function ($\hat{\mu}&lt;em>0$ and $\hat{\mu}&lt;em>1$), then coomputes the imputed treatment effects ($\tilde{D}&lt;/em>{i}^{1}$ and $\tilde{D}&lt;/em>{i}^{0}$), then estimates the conditional average treatment effect for treated and controls ($\hat{\tau}_1$ and $\hat{\tau}_0$), and finally averages the estimates.&lt;/p>
&lt;p>$$\begin{aligned}
\hat{\mu}&lt;em>0 = M_1(Y^0 \sim X^0)\\
\hat{\mu}&lt;em>1 = M_2(Y^1 \sim X^1)\\
\tilde{D}&lt;/em>{i}^{1}=Y_i^1 - \hat{\mu}&lt;/em>{0}(X_i^1)\\
\tilde{D}_{i}^{0}=\hat{\mu}&lt;em>1(X_i^0) - Y&lt;/em>{i}^{0}\\
\hat{\tau}_1=M_3(\tilde{D}^{1} \sim X^{1})\\
\hat{\tau}_0=M_4(\tilde{D}^{0} \sim X^{0})\\
\hat{\tau}(x)=g(x)\hat{\tau}_0(x)+(1-g(x))\hat{\tau}_1(x)
\end{aligned}$$&lt;/p>
&lt;p>$g(x)\in[0,1]$ is a weighting function which is chosen to minimize the variance of $\hat{\tau}(x)$. $g(x)$ can be estimated by the propsensity score or set to a constant and equal o the ratio of treated to untreated samples.&lt;/p>
&lt;h2 id="g-computation">G-Computation&lt;/h2>
&lt;p>Parameters we can derive from the G-Computation:&lt;/p>
&lt;ul>
&lt;li>For a binary outcome, $Y\in{0,1}$, $P(Y=1)=E[Y]$.&lt;/li>
&lt;li>$E[Y^1]-E[Y^0]$ is the causal risk difference due to treatment&lt;/li>
&lt;li>$E[Y^1]/E[Y^0]$ is the causal relative risk&lt;/li>
&lt;li>$E&lt;a href="1-E%5BY%5E0%5D">Y^1&lt;/a>/E&lt;a href="1-E%5BY%5E1%5D">Y^0&lt;/a>$ is the causal odds ratio due to treatment&lt;/li>
&lt;/ul></description></item><item><title>Cloudflare pages for static hosting</title><link>https://asifr.com/cloudflare-pages-static-hosting/</link><pubDate>Thu, 21 Jan 2021 00:00:00 +0000</pubDate><guid>https://asifr.com/cloudflare-pages-static-hosting/</guid><description>
&lt;p>The asifr.com website has been hosted on a cheap DigitalOcean server for the last 8 years. I hadn&amp;rsquo;t touched the server for many years and it stagnated running Ubuntu 14 and didn&amp;rsquo;t have HTTPS. Today, I migrated the static site to Cloudflare Pages for free hosting in two steps:&lt;/p>
&lt;ol>
&lt;li>Create a github repository with the static files. The repo can be public or private.&lt;/li>
&lt;li>Connect Cloudflare pages to the github repo and set the path to the static files in your repo (e.g. &lt;code>/public&lt;/code> or &lt;code>/dist&lt;/code>).&lt;/li>
&lt;/ol>
&lt;p>Offline, I use pandoc to compile a folder of Markdown files into static HTML files.&lt;/p>
&lt;p>Cloudflare Pages automatically rebuilds the site when I commit the changed files to the github repo. However, since we don&amp;rsquo;t have a cloud-based build step and are generating the static files offline, Cloudflare Pages simply serves the public static files.&lt;/p>
&lt;p>Cloudflare gives a nice domain name: asifr.pages.dev, but also provides custom DNS service so I can point asifr.com to asifr.pages.dev. It takes a few hours for the domain name propagation once the nameservers are transferred and the CNAME is created. Now asifr.com has secure HTTPS connections and is served from Cloudflare&amp;rsquo;s edge network.&lt;/p>
&lt;p>The benefits are: I get a free hosting service since Cloudflare Pages provides unlimited bandwidth, requests, and sites (the limitation of the free-tier is I get 500 builds per month &amp;ndash; which is plenty for a simple personal site that is updated infrequently). You also get security, HTTPS, analytics, and users get fast access to the site from anywhere in the world.&lt;/p></description></item><item><title>Interpretable univariate risk curves</title><link>https://asifr.com/interpretable-risk-curves/</link><pubDate>Sun, 05 Jul 2020 00:00:00 +0000</pubDate><guid>https://asifr.com/interpretable-risk-curves/</guid><description>
&lt;p>High accuracy complex models, like neural networks and generalized additive models, come at the expense of interpretability. The contribution of individual features to the outcome are difficult to understand in complex models and have recieved significant criticism, partcularly in high-stakes settings (like medicine and criminal cases) where trust is critical. &lt;a href="http://people.dbmi.columbia.edu/noemie/papers/15kdd.pdf">Intelligible models&lt;/a> develop trust with the user and helps us debug counter-intuitive or inaccurate relationships learned by the model. Before applying complex models, it&amp;rsquo;s usually a good idea to look at odds ratios and risk curves to visualzie how each variable is related to the outcome. Odds ratios can quantify the strength of the relationship and is a good first step in ranking features that have a strong association with the outcome, compared to features with weaker associations. Risk curves help us visualize the shape of the association (e.g. linear, non-linear, monotonic, increasing, decreasing).&lt;/p>
&lt;h2 id="odds-ratios">Odds-ratios&lt;/h2>
&lt;p>Odds ratios are widely used to compare the relative odds of the occurrence of the outcome (e.g. disease), given exposure to a feature. The odds ratio can also be used to determine whether a particular feature is a risk factor for a an outcome, and to compare the magnitude of various risk factors for that outcome.&lt;/p>
&lt;ul>
&lt;li>OR = 1 Feature does not affect odds of outcome&lt;/li>
&lt;li>OR &amp;gt; 1 Feature is associated with higher odds of outcome&lt;/li>
&lt;li>OR &amp;lt; 1 Feature is associated with lower odds of outcome&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">import&lt;/span> statsmodels.api &lt;span style="color:#f00">as&lt;/span> sm
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> scipy &lt;span style="color:#f00">import&lt;/span> stats
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>stats.chisqprob = &lt;span style="color:#f00">lambda&lt;/span> chisq, df: stats.chi2.sf(chisq, df)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">univariatelr&lt;/span>(X,Y,feature_name,binary=&lt;span style="color:#f00">False&lt;/span>):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> res = sm.Logit(Y, X, missing=&lt;span style="color:#87ceeb">&amp;#39;drop&amp;#39;&lt;/span>).fit(disp=&lt;span style="color:#f60">0&lt;/span>, intercept=&lt;span style="color:#f00">True&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ci = np.round(np.exp(res.conf_int(alpha=&lt;span style="color:#f60">0.05&lt;/span>, cols=&lt;span style="color:#f00">None&lt;/span>)),&lt;span style="color:#f60">2&lt;/span>).squeeze()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> r = {&lt;span style="color:#87ceeb">&amp;#39;feature&amp;#39;&lt;/span>:feature_name,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;nobs&amp;#39;&lt;/span>: np.sum(X==&lt;span style="color:#f60">1&lt;/span>) &lt;span style="color:#f00">if&lt;/span> binary &lt;span style="color:#f00">else&lt;/span> res.nobs,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;coeff&amp;#39;&lt;/span>: round(res.params[&lt;span style="color:#f60">0&lt;/span>],&lt;span style="color:#f60">2&lt;/span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;OR&amp;#39;&lt;/span>: round(np.exp(res.params[&lt;span style="color:#f60">0&lt;/span>]),&lt;span style="color:#f60">2&lt;/span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;ci_low&amp;#39;&lt;/span>: ci[&lt;span style="color:#f60">0&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;ci_high&amp;#39;&lt;/span>: ci[&lt;span style="color:#f60">1&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;p-value&amp;#39;&lt;/span>: np.round(res.pvalues[&lt;span style="color:#f60">0&lt;/span>],&lt;span style="color:#f60">2&lt;/span>)}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> r
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The coefficients of a logistic regression model are the log-odds and taking the exponential of the coefficient gives us the odds-ratio. Odds-ratios are plotted along with the confidence interval. If the confidence interval overlaps with OR=1, than the feature has a weak or no association with the outcome. Below is an example of the odds-ratios from univariate logistic regressions where the outcome label is survived or expired in the ICU.&lt;/p>
&lt;p>&lt;img src="images/univariate_or.png" alt="">&lt;/p>
&lt;h2 id="risk-curves">Risk curves&lt;/h2>
&lt;p>Risk curves describe the relationship between a feature and an outcome. For example a risk curve can be used to understand the relationship between a physiological measure like heart rate and an outcome like the probability of developing a disease. These graphical interpretations help explain hidden relationships in data, are interpretable by a non-technical audience, and relatively easy to construct.&lt;/p>
&lt;p>&lt;img src="images/lactate_risk_curve.png" alt="">&lt;/p>
&lt;p>The risk curve for lactate shows that the risk of mortality monotonically increases with increasing lactate. We could also reasonably draw a cutoff around Lactate &amp;gt; 2 mmol to and group these high lactate patients for further analysis.&lt;/p>
&lt;p>Building such a risk curve is simple enough. Given a table of feature value and label:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Value&lt;/th>
&lt;th>Label&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>0.800&lt;/td>
&lt;td>0&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>4.767&lt;/td>
&lt;td>1&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>0.700&lt;/td>
&lt;td>0&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>1.100&lt;/td>
&lt;td>0&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>1.575&lt;/td>
&lt;td>0&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;p>First bucket each sample into a bin, for example: &lt;code>(0.1 , 0.6)&lt;/code>, &lt;code>(0.6, 1.2)&lt;/code>, &amp;hellip;, &lt;code>(4.4, 5.)&lt;/code>. You can use quantiles to determine these bins or just equally divide your data into bins. We can use the &lt;code>histogram&lt;/code> function to divide our data into equal sized bins with a specified range and count the number of samples that fall into each bin (&lt;code>h_control&lt;/code>, &lt;code>h_treated&lt;/code>).&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>x = df.Value
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>y = df.Label
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>x = x[(x&amp;gt;=minval) &amp;amp; (x&amp;lt;=maxval)]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>xmin = x.min()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>xmax = x.max()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>h_control, b_control = np.histogram(x[y==&lt;span style="color:#f60">0&lt;/span>], range=(xmin, xmax), bins=&lt;span style="color:#f60">10&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>h_treated, b_treated = np.histogram(x[y==&lt;span style="color:#f60">1&lt;/span>], range=(xmin, xmax), bins=&lt;span style="color:#f60">10&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>risk = np.log((h_treated/h_treated.sum())/(h_control/h_control.sum()))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>vals = b_unstable[:-&lt;span style="color:#f60">1&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>plt.plot(vals, risk)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The log-odds (&lt;code>risk&lt;/code>) is simply the fraction of treated samples in the bin divided by the fraction of untreated samples in the bin.&lt;/p></description></item><item><title>Transform Grouped Pandas DataFrame to Numpy Array</title><link>https://asifr.com/transform-grouped-dataframe-to-numpy/</link><pubDate>Mon, 22 Jun 2020 00:00:00 +0000</pubDate><guid>https://asifr.com/transform-grouped-dataframe-to-numpy/</guid><description>
&lt;p>This snippet transforms a tall Pandas DataFrame with time-series data into a Numpy array while preserving the grouping. This is a common use case for me when preparing training data for recurrent neural networks, where each training sample belongs to a group (&lt;code>EventID&lt;/code> below), feature values (&lt;code>FeatureValue&lt;/code>) are orded by time (&lt;code>DateTime&lt;/code>), and I want to get the length of each sample (needed to train an RNN with variable length sequences).&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>EventID&lt;/th>
&lt;th>DateTime&lt;/th>
&lt;th>FeatureValue&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>1&lt;/td>
&lt;td>0&lt;/td>
&lt;td>80&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>1&lt;/td>
&lt;td>5&lt;/td>
&lt;td>90&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>2&lt;/td>
&lt;td>0&lt;/td>
&lt;td>75&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>2&lt;/td>
&lt;td>10&lt;/td>
&lt;td>80&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>event_col = &lt;span style="color:#87ceeb">&amp;#39;EventID&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>time_col = &lt;span style="color:#87ceeb">&amp;#39;DateTime&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>value_col = &lt;span style="color:#87ceeb">&amp;#39;FeatureValue&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>xt = df.loc[:,[time_col, value_col]].values
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>g = df.reset_index(drop=&lt;span style="color:#f00">True&lt;/span>).groupby(event_col)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>xtg = [xt[i.values,:] &lt;span style="color:#f00">for&lt;/span> k,i in g.groups.items()]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>SignalLengths = [len(i.values) &lt;span style="color:#f00">for&lt;/span> k,i in g.groups.items()]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>X_signal = np.array(xtg)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>EventIDs = list(g.groups.keys())
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>AWS Lambda Web Scraper</title><link>https://asifr.com/aws-lambda-scraper/</link><pubDate>Sun, 21 Jun 2020 00:00:00 +0000</pubDate><guid>https://asifr.com/aws-lambda-scraper/</guid><description>
&lt;p>This is a small AWS Lambda function to scrape websites using &lt;code>axios&lt;/code> and store the data in a MongoDB document. You can setup an API Gateway to the Lambda function and use &lt;code>GET&lt;/code> requests to call the function.&lt;/p>
&lt;p>&lt;strong>Features&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Randomly selects from a set of headers with each call.&lt;/li>
&lt;li>Automatically sets the host and referer to the same domain.&lt;/li>
&lt;li>Saves the response to MongoDB.&lt;/li>
&lt;li>Optionally sets the header to json if you expect the output to be json format&lt;/li>
&lt;li>Optionally sets the request to XMLHttpRequest&lt;/li>
&lt;/ul>
&lt;p>Install the required Node modules: &lt;code>npm install axios mongodb dotenv&lt;/code>&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-javascript" data-lang="javascript">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">const&lt;/span> request = require(&lt;span style="color:#87ceeb">&amp;#39;axios&amp;#39;&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">const&lt;/span> MongoClient = require(&lt;span style="color:#87ceeb">&amp;#39;mongodb&amp;#39;&lt;/span>).MongoClient;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">const&lt;/span> crypto = require(&lt;span style="color:#87ceeb">&amp;#39;crypto&amp;#39;&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">// local .env files are loaded into process.env
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span>require(&lt;span style="color:#87ceeb">&amp;#39;dotenv&amp;#39;&lt;/span>).config({silent: &lt;span style="color:#f00">false&lt;/span>});
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">// load the MongoDB connection string from the .env file
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span>&lt;span style="color:#f00">const&lt;/span> mongo_host = process.env.MONGO
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">// database and collection name
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span>databaseName = &lt;span style="color:#87ceeb">&amp;#39;scraper&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>collectionName = &lt;span style="color:#87ceeb">&amp;#39;rawdata&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">// set of headers from which we will randomly select
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span>&lt;span style="color:#f00">let&lt;/span> headers_list = [
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// Firefox 77 Mac
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;User-Agent&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:77.0) Gecko/20100101 Firefox/77.0&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Accept&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Accept-Language&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;en-US,en;q=0.5&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Referer&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;https://www.google.com/&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;DNT&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;1&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Connection&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;keep-alive&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Upgrade-Insecure-Requests&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;1&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// Firefox 77 Windows
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;User-Agent&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Accept&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Accept-Language&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;en-US,en;q=0.5&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Accept-Encoding&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;gzip, deflate, br&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Referer&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;https://www.google.com/&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;DNT&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;1&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Connection&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;keep-alive&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Upgrade-Insecure-Requests&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;1&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// Chrome 83 Mac
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Connection&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;keep-alive&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;DNT&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;1&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Upgrade-Insecure-Requests&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;1&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;User-Agent&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Accept&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Sec-Fetch-Site&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;none&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Sec-Fetch-Mode&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;navigate&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Sec-Fetch-Dest&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;document&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Referer&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;https://www.google.com/&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Accept-Encoding&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;gzip, deflate, br&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Accept-Language&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;en-GB,en-US;q=0.9,en;q=0.8&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// Chrome 83 Windows
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Connection&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;keep-alive&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Upgrade-Insecure-Requests&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;1&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;User-Agent&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Accept&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Sec-Fetch-Site&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;same-origin&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Sec-Fetch-Mode&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;navigate&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Sec-Fetch-User&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;?1&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Sec-Fetch-Dest&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;document&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Referer&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;https://www.google.com/&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Accept-Encoding&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;gzip, deflate, br&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;Accept-Language&amp;#34;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#34;en-US,en;q=0.9&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">function&lt;/span> isValidURL(string) {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">var&lt;/span> res = string.match(&lt;span style="color:#87ceeb">/(http(s)?:\/\/.)?(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&amp;amp;//=]*)/g&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> (res !== &lt;span style="color:#f00">null&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>};
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>module.exports.scrape = &lt;span style="color:#f00">async&lt;/span> event =&amp;gt; {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// start by parsing the body assuming a POST statement with a JSON body
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> &lt;span style="color:#f00">let&lt;/span> body = JSON.parse(event.body)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// url is required
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> &lt;span style="color:#f00">if&lt;/span> (!(&lt;span style="color:#87ceeb">&amp;#39;url&amp;#39;&lt;/span> &lt;span style="color:#f00">in&lt;/span> body)) {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> {ok: &lt;span style="color:#f60">0&lt;/span>, msg: &lt;span style="color:#87ceeb">&amp;#39;Missing URL&amp;#39;&lt;/span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// check the url is valid
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> &lt;span style="color:#f00">if&lt;/span> (!isValidURL(body.url)) {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> {ok: &lt;span style="color:#f60">0&lt;/span>, msg: &lt;span style="color:#87ceeb">&amp;#39;Invalid URL&amp;#39;&lt;/span>}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">let&lt;/span> url = body.url
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">let&lt;/span> host = &lt;span style="color:#f00">new&lt;/span> URL(url)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// randomly select a header
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> &lt;span style="color:#f00">let&lt;/span> headers = headers_list[Math.floor(Math.random() * headers_list.length)]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// the request should look like it is originating from the host
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> headers[&lt;span style="color:#87ceeb">&amp;#39;Host&amp;#39;&lt;/span>] = host.host
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// referer is from the same domain, referers from google.com are often
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> &lt;span style="color:#0f0">// redirected, which we want to avoid
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> headers[&lt;span style="color:#87ceeb">&amp;#39;Referer&amp;#39;&lt;/span>] = host.origin
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// set json headers if we expect the response to be in json
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> &lt;span style="color:#f00">if&lt;/span> (&lt;span style="color:#87ceeb">&amp;#39;json&amp;#39;&lt;/span> &lt;span style="color:#f00">in&lt;/span> body) {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> headers[&lt;span style="color:#87ceeb">&amp;#39;Accept&amp;#39;&lt;/span>] = &lt;span style="color:#87ceeb">&amp;#39;application/json, text/javascript, */*; q=0.01&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// set XMLHttpRequest header, which helps when calling private APIs that
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> &lt;span style="color:#0f0">// would typically be loaded by AJAX calls
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> &lt;span style="color:#f00">if&lt;/span> (&lt;span style="color:#87ceeb">&amp;#39;ajax&amp;#39;&lt;/span> &lt;span style="color:#f00">in&lt;/span> body) {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> headers[&lt;span style="color:#87ceeb">&amp;#39;X-Requested-With&amp;#39;&lt;/span>] = &lt;span style="color:#87ceeb">&amp;#39;XMLHttpRequest&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// send a GET request with our headers
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> &lt;span style="color:#f00">const&lt;/span> response = &lt;span style="color:#f00">await&lt;/span> request({
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;url&amp;#39;&lt;/span>: url,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;method&amp;#39;&lt;/span>: &lt;span style="color:#87ceeb">&amp;#39;get&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;headers&amp;#39;&lt;/span>: headers,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> });
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> (response.status == &lt;span style="color:#f60">200&lt;/span>) {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// create a data object containing the response body and headers
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> &lt;span style="color:#f00">let&lt;/span> date = &lt;span style="color:#f00">new&lt;/span> Date()
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">let&lt;/span> data = {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;url&amp;#39;&lt;/span>: url,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;url_hash&amp;#39;&lt;/span>: crypto.createHash(&lt;span style="color:#87ceeb">&amp;#39;md5&amp;#39;&lt;/span>).update(url).digest(&lt;span style="color:#87ceeb">&amp;#34;hex&amp;#34;&lt;/span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;host&amp;#39;&lt;/span>: host.host,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;data&amp;#39;&lt;/span>: response.data,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;processed&amp;#39;&lt;/span>: &lt;span style="color:#f00">false&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;scraped_at&amp;#39;&lt;/span>: date,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;scraped_year&amp;#39;&lt;/span>: date.getFullYear(),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;scraped_month&amp;#39;&lt;/span>: date.getMonth() + &lt;span style="color:#f60">1&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;scraped_day&amp;#39;&lt;/span>: date.getDate(),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;response_headers&amp;#39;&lt;/span>: response.headers,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#39;request_headers&amp;#39;&lt;/span>: headers
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// create a connection to the MongoDB
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> &lt;span style="color:#f00">const&lt;/span> client = &lt;span style="color:#f00">await&lt;/span> MongoClient.connect(mongo_host, {useUnifiedTopology: &lt;span style="color:#f00">true&lt;/span>});
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// select the database
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> &lt;span style="color:#f00">const&lt;/span> db = client.db(databaseName);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// insert data into collection in database
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> &lt;span style="color:#f00">let&lt;/span> r = &lt;span style="color:#f00">await&lt;/span> db.collection(collectionName).insertOne(data);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// close the connection to MongoDB
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> client.close();
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> (r.insertedCount == &lt;span style="color:#f60">1&lt;/span>) {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0">// return the newly created ObjectID if a new document was successfully inserted
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0">&lt;/span> &lt;span style="color:#f00">return&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ok: &lt;span style="color:#f60">1&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> url: url,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> insertedId: r.insertedId,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> };
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> } &lt;span style="color:#f00">else&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ok: &lt;span style="color:#f60">0&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> url: url,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> status: response.status,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> msg: &lt;span style="color:#87ceeb">&amp;#39;Bad response status&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> };
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ok: &lt;span style="color:#f60">0&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> url: url
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> };
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>};
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Deploy static website with rsync</title><link>https://asifr.com/rsync-deploy-static-website/</link><pubDate>Sun, 21 Jun 2020 00:00:00 +0000</pubDate><guid>https://asifr.com/rsync-deploy-static-website/</guid><description>
&lt;p>Create a Makefile and use &lt;code>rsync&lt;/code> to deploy your &lt;code>./public&lt;/code> folder to a remote server. Replace &lt;code>&amp;lt;IP-ADDRESS&amp;gt;&lt;/code> with the website location and set &lt;code>FOLDER&lt;/code> to the remote location where you want to upload your local files. In practice it&amp;rsquo;s nice to have a staging folder to test out your website before deployment. Use &lt;code>make staging&lt;/code> and &lt;code>make deploy&lt;/code> to sync changes.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>.PHONY: staging deploy
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#eedd82">SERVER&lt;/span> = &amp;lt;IP-ADDRESS&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#eedd82">FOLDER&lt;/span> = /var/www/html
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#eedd82">STAGING&lt;/span> = /var/www/html/staging
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#eedd82">USER&lt;/span> = root
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>deploy:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rsync -zarvh ./public/* --compress --recursive --checksum --delete --itemize-changes --exclude-from exclude.rsync &lt;span style="color:#f00">$(&lt;/span>USER&lt;span style="color:#f00">)&lt;/span>@&lt;span style="color:#f00">$(&lt;/span>SERVER&lt;span style="color:#f00">)&lt;/span>:&lt;span style="color:#f00">$(&lt;/span>FOLDER&lt;span style="color:#f00">)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>staging:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rsync -zarvh ./public/* --compress --recursive --checksum --delete --itemize-changes --exclude-from exclude.rsync &lt;span style="color:#f00">$(&lt;/span>USER&lt;span style="color:#f00">)&lt;/span>@&lt;span style="color:#f00">$(&lt;/span>SERVER&lt;span style="color:#f00">)&lt;/span>:&lt;span style="color:#f00">$(&lt;/span>STAGING&lt;span style="color:#f00">)&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Svelte Webpack Boilerplate</title><link>https://asifr.com/svelte-webpack/</link><pubDate>Fri, 22 May 2020 00:00:00 +0000</pubDate><guid>https://asifr.com/svelte-webpack/</guid><description>
&lt;p>This setup uses &lt;code>webpack&lt;/code> to bundle code and &lt;a href="https://svelte.dev/">Svelte&lt;/a> to create the user interface. This looks for files under &lt;code>/src&lt;/code> and saves the compiled and minified javascript code in &lt;code>/public/js&lt;/code>.&lt;/p>
&lt;p>Install the required Node modules: &lt;code>npm install svelte svelte-loader&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-js" data-lang="js">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">const&lt;/span> path = require(&lt;span style="color:#87ceeb">&amp;#39;path&amp;#39;&lt;/span>);
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>module.exports = {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> entry: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> dashboard: &lt;span style="color:#87ceeb">&amp;#39;./src/dashboard.js&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> account: &lt;span style="color:#87ceeb">&amp;#39;./src/account.js&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> output: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> path: path.resolve(__dirname, &lt;span style="color:#87ceeb">&amp;#39;public/js/&amp;#39;&lt;/span>),
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> filename: &lt;span style="color:#87ceeb">&amp;#34;[name].js&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> mode: &lt;span style="color:#87ceeb">&amp;#34;production&amp;#34;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> module: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> rules: [
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> test: &lt;span style="color:#87ceeb">/\.(html|svelte)$/&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> exclude: &lt;span style="color:#87ceeb">/node_modules/&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> use: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> loader: &lt;span style="color:#87ceeb">&amp;#39;svelte-loader&amp;#39;&lt;/span>,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> options: {}
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> plugins: [],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> resolve: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> alias: {
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> svelte: path.resolve(&lt;span style="color:#87ceeb">&amp;#39;node_modules&amp;#39;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#39;svelte&amp;#39;&lt;/span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> },
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> extensions: [&lt;span style="color:#87ceeb">&amp;#39;.mjs&amp;#39;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#39;.js&amp;#39;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#39;.svelte&amp;#39;&lt;/span>],
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> mainFields: [&lt;span style="color:#87ceeb">&amp;#39;svelte&amp;#39;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#39;browser&amp;#39;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#39;module&amp;#39;&lt;/span>, &lt;span style="color:#87ceeb">&amp;#39;main&amp;#39;&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> }
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>};
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Arrhythmia classification with stationary first order Markov process</title><link>https://asifr.com/arrhythmia-classification-with-markov-process/</link><pubDate>Mon, 24 Dec 2018 00:00:00 +0000</pubDate><guid>https://asifr.com/arrhythmia-classification-with-markov-process/</guid><description>
&lt;p>Time series sequences can be described by their statistical properties like the mean level, trend, perodicity, autocorrelation, variability, and entropy. Most sequence classification models exploit these and other statistical differences between time series signals. However, some features are computationally expensive to calculate and they may not have sufficient discriminative power. Here I will show how Markov chains can be used to derive a simple discriminative statistical measure to separate time series sequences by summarizing the transition probabilities between consecutive timesteps.&lt;/p>
&lt;p>A classic paper on ECG time series classification: &lt;a href="http://ecg.mit.edu/george/publications/afib-cinc-1983.pdf">A new method for detecting atrial fibrillation using R-R intervals&lt;/a> by George Moody and Roger Mark (1990), uses markov chains to classify sequences of heart rate measurements as normal sinus rhythm (NS) and atrial fibrillation (AF). By using the sequence of heart rate values directly we don&amp;rsquo;t have to engineer new features and can work with the transition matrix derived from the observed sequence. This method is a good choice when signal variability is informative, as is the case in arrhythmia classification. The method is also computationally inexpensive and easy to deploy on low-power devices. The derived &lt;em>Markov score&lt;/em> is also discriminative and can be used as a feature in downstream machine learning models.&lt;/p>
&lt;p>The outline of the method is as follows: transitions between consecutive probabilities are summarized in a transition matrix assuming a stationary first-order Markov process (current value depends only on previous value). The log of the ratio of the two transition matrices for group A and group B yields the log-odds matrix $S$. The score is calculated as the sum of the log-odds for each new observed transition.&lt;/p>
&lt;div style="background:white;text-align:center;">
&lt;p>&lt;img src="https://asifr.com/images/transition_matrix.png" alt="">&lt;/p>
&lt;/div>
&lt;p>&lt;em>Transition matrix&lt;/em>&lt;/p>
&lt;p>In more detail:&lt;/p>
&lt;ul>
&lt;li>Discretize heart rate into a number of bins. We&amp;rsquo;ll call these bins &lt;em>states&lt;/em>.&lt;/li>
&lt;li>Calculate a transition probability ($p_{i,j}$ = number of transitions from $state_i$ to $state_j$ / total number of transitions) between each state and store it in a transition matrix (above figure). We&amp;rsquo;ll call this matrix $T$. We want a transition matrix for each class. In our example we will be distinguishing a normal sinus rhythm ($T_\text{NS}$) from an arrhythmia like atrial fibrilation ($T_\text{AF}$).&lt;/li>
&lt;li>Calculate the odds ratio of observing a state transiton in a normal rhythm and an abnormal rhythm: $p_{i,j}^{\text{NS}} / p_{i,j}^{\text{AF}}$. Store the odds ratios for each state transition in a score matrix $S$. &lt;em>In practice we can get the score matrix $S$ simply by dividing the two transition matrices: $S = T_\text{NS} / T_\text{AF}$.&lt;/em>&lt;/li>
&lt;li>Take the log of the score matrix so $S = \text{log}(T_\text{NS} / T_\text{AF})$. This gives us the &lt;em>log odds&lt;/em>. We can interpret the score matrix as $S_{i,j}=0$ if there is an equal likelihood of observing a transition from $state_i$ to $state_j$ in both a normal sinus and AF rhythm. Note that taking the $log(A/B)$ is equivalent to subtracting $log(A)-log(B)$. It follows that positive values of $S$ ($S_{i,j}&amp;gt;0$) means a greater chance that the state transition from $state_i$ to $state_j$ comes from a normal rhythm and negative values of $S$ ($S_{i,j}&amp;lt;0$) means you are more likely to observe the state transition in an AF rhythm.&lt;/li>
&lt;li>To get the final score we sum up the transition probabilities from the score matrix for each observation and choose an appropriate threshold at which we are confident the transition comes from an AF rhythm.&lt;/li>
&lt;/ul>
&lt;p>I will use ECG data from the &lt;a href="https://physionet.org/challenge/2017/">Physionet 2017 Challenge&lt;/a> to illustrate how a Markov process is used to classify time series signals. Participants in the challenge had to classify ECG signals collected from mobile sensors as normal sinus rhythm, atrial fibrillation, noise or other. For our purposes we ignore the noise and other labels and simply classify normal and atrial fibrillation.&lt;/p>
&lt;p>Since the data is provided as raw ECG signals we first extract heart rate by detecting the R-peaks of the ECG beat and calculating the R to R intervals (heart rate = 60 / RR interval in seconds). The distribution of RR intervals between the two groups already shows a good separation between Normal and AF rhythms.&lt;/p>
&lt;div style="background:white;text-align:center;">
&lt;p>&lt;img src="https://asifr.com/images/rr-interval-dist.png" alt="">&lt;/p>
&lt;/div>
&lt;p>&lt;em>R-R interval distribution&lt;/em>&lt;/p>
&lt;p>Next, calculate the transition matrix for all RR intervals in the training data.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">def&lt;/span> &lt;span style="color:#ff0">transition_matrix&lt;/span>(transitions):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#87ceeb">&amp;#34;&amp;#34;&amp;#34;Takes an array of discrete values and returns a transition matrix
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Args:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> transitions: list of states
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> Returns:
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> transition matrix, rows must sum to 1
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> &amp;#34;&amp;#34;&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> n = &lt;span style="color:#f60">1&lt;/span> + max(transitions) &lt;span style="color:#0f0"># number of states&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> M = np.zeros((n,n))
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> (i,j) in zip(transitions,transitions[&lt;span style="color:#f60">1&lt;/span>:]):
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> M[i][j] += &lt;span style="color:#f60">1&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#0f0"># now convert to probabilities:&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">for&lt;/span> row in M:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> s = sum(row)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">if&lt;/span> s &amp;gt; &lt;span style="color:#f60">0&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> row[:] = [f/s &lt;span style="color:#f00">for&lt;/span> f in row]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f00">return&lt;/span> np.array(M)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The rows of the transition matrix sum to 1 and shows the probability of observing a transition between RR intervals (in milliseconds) from time $t-1$ to $t$. The RR intervals were first discretized into bins of step size 50. The coarseness of the discretization is a hyperparameter that can be tuned to find the level that gives the best class separation.&lt;/p>
&lt;div style="background:white;text-align:center;">
&lt;p>&lt;img src="https://asifr.com/images/ns-af-transition-matrix.png" alt="">&lt;/p>
&lt;/div>
&lt;p>&lt;em>Normal sinus rhythm (left), Atrial fibrillation (right) transition matrix&lt;/em>&lt;/p>
&lt;p>Dividing and taking the log of the two transition matrices gives the us the score matrix where each element is the log-odds ratio of observing a transition between two consecutive states. In the score matrix below, transitions in red are more likely to be observed in atrial fibrilation rhythms and transitions in green are more likely to be observed in normal sinus rhythm.&lt;/p>
&lt;div style="background:white;text-align:center;">
&lt;p>&lt;img src="https://asifr.com/images/score-matrix.png" alt="">&lt;/p>
&lt;/div>
&lt;p>&lt;em>Score matrix&lt;/em>&lt;/p>
&lt;p>The final score is calculated by summing over transitions in a newly observed sequence of RR intervals.&lt;/p>
&lt;p>$$M = \sum_{t=2}^{t=N} S(t-1,t)$$&lt;/p>
&lt;p>The score can be plotted over time to show how the score changes with new observations. Examples for normal sinus rhythms have an increasing score and examples with atrial fibrillation have a decreasing score. We can use a cutoff threshold of zero such that a score &amp;gt; 0 will classify the example as a normal sinus rhythm.&lt;/p>
&lt;div style="background:white;text-align:center;">
&lt;p>&lt;img src="https://asifr.com/images/markov-score.png" alt="">&lt;/p>
&lt;/div>
&lt;p>&lt;em>Markov score&lt;/em>&lt;/p>
&lt;p>&lt;strong>Some closing throughts&lt;/strong>: Why is the &lt;em>Markov score&lt;/em> discriminative? Because we calculated the log-odds ratio between the two classes. The score works well in binary classification where the variability between consecutive observations is an informative discriminator. It may not work well in cases where the signal is periodic and the classification accuracy depends on information captured in the periodicity of the signal. In these cases a simple differencing or &lt;a href="https://otexts.org/fpp2/stl.html">STL decomposition&lt;/a> may help remove the periodic signal. The score presented here is also limited by the fact that it is first-order, meaning we look at only consecutive differences between time $t-1$ and $t$. In practice it may be better to use the score as a feature in a machine learning classifier since it captures a discriminiative attribute regarding the variability of the signal.&lt;/p></description></item><item><title>Nowcasting: Maintaining real time estimates of infrequently observed time series</title><link>https://asifr.com/nowcasting/</link><pubDate>Wed, 05 Dec 2018 00:00:00 +0000</pubDate><guid>https://asifr.com/nowcasting/</guid><description>
&lt;p>Time series analysis appears in every disciple from physiology to retail pricing. A time series variable is typically measured sequentially at fixed intervals of time (often equispaced but not necessarily). Variables may be measured less frequently than theoretically possible for reasons of cost, effort, or convention. With local level linear trend models we can maintain realtime measures of infrequently measured values (see &lt;a href="http://people.ischool.berkeley.edu/~hal/Papers/2013/pred-present-with-bsts.pdf">Predicting the Present with Bayesian Structural Time Series&lt;/a>). The problem has been referred to as &lt;em>nowcasting&lt;/em> because the goal is to maintain a current estimate of the value of a time series by forecasting the current value instead of the future value. The term itself is not very important as the task is essentially a standard forecasting problem.&lt;/p>
&lt;p>Consider a measurement like US weekly initial claims for unemployment (ICNSA), which is a recession leading indicator. Can we learn this week&amp;rsquo;s number before it is released? To answer this question we would need a real time signal correlated with the outcome (ICNSA numbers). We can use &lt;a href="https://www.google.com/trends/correlate/">Google Correlate&lt;/a> to extract the top 100 search terms that are most correlated with the ICNSA signal. Google Correlate finds search terms that vary in a similar way to your own time series, ICNSA signal in our case. The 100 search term time series signals are our explanatory (also caled exogenous) variables that can be included as regressors to improve the ICNSA forecast performance. The idea is that contemporaneous signals (&lt;em>exogenous variables&lt;/em>) are correlated in time with the unobserved signal (&lt;em>endogenous variable&lt;/em>) we are trying to estimate and by regressing on these features can improve our forecast. The temporal structure in these observed signals can be exploited to infer the behaviour of an unobserved signal. Here we will explore using structural time series models that decompose a signal into additive components consisting a linear trend and a mean level.&lt;/p>
&lt;div style="background:white;">
&lt;p>&lt;img src="https://asifr.com/images/icnsa-signal.png" alt="US weekly initial claims for unemployment (ICNSA)">&lt;/p>
&lt;/div>
&lt;p>&lt;em>US weekly initial claims for unemployment (ICNSA)&lt;/em>&lt;/p>
&lt;h2 id="brief-description-of-structural-time-series-models">Brief description of structural time series models&lt;/h2>
&lt;p>The general approach to time series analysis is to first remove or model the parts that change through time to get a stationary series (a time series is stationary if its statistical properties, like variance, don&amp;rsquo;t change through time). Next, we use a time series model to capture the correlation in the stationary series. A series can be decomposed into:&lt;/p>
&lt;ul>
&lt;li>trend components (long-term change in the mean level)&lt;/li>
&lt;li>seasonality component (variation in mean that is periodic in nature and you generally know the period beforehand)&lt;/li>
&lt;li>cycles (variation that oscillates but not according to some known or fixed period)&lt;/li>
&lt;li>exogenous variables that have some correlation with the endogenous variable&lt;/li>
&lt;li>noise&lt;/li>
&lt;/ul>
&lt;p>The various components can be combined additively to model the endogenous variable $y$ at time $t$. Such additive models are desirable because we can interpret each term, progressively increase model complexity, and easily diagnose model performance. More concretely a typical model will be written as:&lt;/p>
&lt;p>$$
y_t=\mu_t+\gamma_t+\beta^Tx_t+\epsilon_t
$$&lt;/p>
&lt;p>where $y_t$ is the endogenous variable we want to forecast, $\mu_t$ captures changes in the mean level over time, $\gamma_t$ models the periodic nature of the signal, $\beta^T\boldsymbol{x}_t$ is a regression term with exogenous variables and $\epsilon_t$ is the noise term.&lt;/p>
&lt;p>The &lt;em>local linear trend&lt;/em> model decomposes the time series into a &lt;em>local level&lt;/em> component and a &lt;em>trend&lt;/em> component.&lt;/p>
&lt;p>$$
\mu_t = \mu_{t-1}+\delta_{t-1}+u_t
$$&lt;/p>
&lt;p>$$
\delta_t = \delta_{t-1}+v_t
$$&lt;/p>
&lt;p>The current level of the trend is $\mu_t$, the current &amp;ldquo;slope&amp;rdquo; of the trend is $\delta_t$, and the noise terms are $u_t$ and $v_t$.&lt;/p>
&lt;p>This kind of model is referred to as &lt;code>UnobservedComponents&lt;/code> in &lt;code>statsmodels&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-python" data-lang="python">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f00">from&lt;/span> statsmodels.tsa.statespace.structural &lt;span style="color:#f00">import&lt;/span> UnobservedComponents
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># train on all time points before this and forecast time points after&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>interventionidx = &lt;span style="color:#f60">200&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># df: dataframe with ICNSA and exogenous variables&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#0f0"># regression_columns: exogenous variables&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>intervention = df.index[interventionidx]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>model = UnobservedComponents(
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> df.loc[:intervention, &lt;span style="color:#87ceeb">&amp;#39;ICNSA&amp;#39;&lt;/span>].values,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> exog = df.loc[:intervention, regression_columns].values,
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> level = &lt;span style="color:#87ceeb">&amp;#39;local linear trend&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>)
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>fit = model.fit(maxiter=&lt;span style="color:#f60">1000&lt;/span>)
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>We can compare a few models: without exogenous variables from Google Correlate, with the top 10 most correlated search terms, and with the bottom 10 least correlated search terms. The figures below show the ICNSA values in blue and the model predictions in red. The model is trained on observations until 2008 (vertical dashed line) and forecasts are made for the unobserved time after 2008. 95% confidence intervals are in grey.&lt;/p>
&lt;div style="background:white;">
&lt;p>&lt;img src="https://asifr.com/images/forecast-without-exogenous.png" alt="ICNSA signal with model predictions">
&lt;img src="https://asifr.com/images/forecast-top-10.png" alt="ICNSA signal with top 10 search terms">
&lt;img src="https://asifr.com/images/forecast-bottom-10.png" alt="ICNSA signal with bottom 10 search terms">&lt;/p>
&lt;/div>
&lt;p>&lt;em>ICNSA signal with model predictions&lt;/em>&lt;/p>
&lt;p>It&amp;rsquo;s clear that adding additional features to the model improves both the fit to the observed data and the forecast. But adding uncorrelated data can have undesired effects on your forecasts. The unobserved components model in &lt;code>statsmodels&lt;/code> is unable to pick the best features since it does not have any kind of regularization. Ideally, we want to select only those correlated search terms that gives the best model fit and forecast. The original paper on &lt;a href="http://people.ischool.berkeley.edu/~hal/Papers/2013/pred-present-with-bsts.pdf">Bayesian Structural Time Series&lt;/a> model provides a methodology for feature selection.&lt;/p>
&lt;p>In addition to applications in forecasting, state space models like the one described above can be used to infer the effect of an intervention, like an ad campaign, for counterfactual inference (see &lt;a href="https://ai.google/research/pubs/pub41854">Inferring Causal Impact from Bayesian Structural Time-Series Models&lt;/a> by Kay Brodersen et. al. (2015))&lt;/p>
&lt;p>Useful references:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="http://people.ischool.berkeley.edu/~hal/Papers/2013/pred-present-with-bsts.pdf">Original paper: Predicting the Present with Bayesian Structural Time Series&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://ai.google/research/pubs/pub41854">Inferring Causal Impact from Bayesian Structural Time-Series Models&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://google.github.io/CausalImpact/CausalImpact.html">Causal Impact R package&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://robjhyndman.com/">Rob Hyndmans books and papers&lt;/a>, the teaching &lt;a href="https://robjhyndman.com/teaching/">slides&lt;/a> are particularly accessible&lt;/li>
&lt;/ul></description></item><item><title>State space models and the Kalman filter</title><link>https://asifr.com/state-space-models/</link><pubDate>Tue, 02 Feb 2016 00:00:00 +0000</pubDate><guid>https://asifr.com/state-space-models/</guid><description>
&lt;p>Linear state-space models are used in time-series analysis for filtering, prediction, and smoothing problems. They assume that the observations are generated linearly from a latent linear dynamical system. Although many real world processes are non-linear, the lineary makes the model easy to analyze and efficient to estimate. In addition, many non-linear systems can be approximated using linear models, thus the linear state-space model is an important tool for time-series applications.&lt;/p>
&lt;p>Consider the basic structural model with a &lt;em>local level&lt;/em> term and a &lt;em>trend&lt;/em> term:&lt;/p>
&lt;p>$$
y_t=\mu_t+\lambda_t w_t+\epsilon_t
$$&lt;/p>
&lt;p>$$
\mu_{t+1}=\mu_t+v_t+W_{1t}
$$&lt;/p>
&lt;p>$$
v_{t+1}=v_t+W_{2t}
$$&lt;/p>
&lt;p>$$
\lambda_{t+1}=\lambda_t+W_{3t}
$$&lt;/p>
&lt;p>where $\epsilon_t \sim N(0,\sigma_{y}^{2})$, $W_{1t} \sim N(0,\sigma_{\mu}^{2})$, and $W_{2t} \sim N(0,\sigma_{v}^{2})$. Here we allow the local level (intercept) and the trend (slope) to vary in time. Note the term local here is in contrast to global, where the level $\mu$ is fixed ($\sigma_{\mu}^{2}=0$) and there is a constant level across time.&lt;/p>
&lt;p>In this case we have added an intervention variable $\lambda$ and $w$, where $\lambda$ is a weighting term and $w$ is a function where the value is zero before the intervention and unity after the intervention.&lt;/p>
&lt;p>Setting all the noise terms $\eta=(\epsilon_t, W_{1t}, W_{2t})$ to zero yields the simple equation of a line with constant intercept and slope. At $t=1$:&lt;/p>
&lt;p>$$
y_1=\mu_1
$$&lt;/p>
&lt;p>$$
\mu_1=\mu_0+v_0
$$&lt;/p>
&lt;p>$$
v_1=v_0
$$&lt;/p>
&lt;p>$$
y_1=\mu_0 + v_0
$$&lt;/p>
&lt;p>At $t=2$:&lt;/p>
&lt;p>$$
y_2=\mu_2
$$&lt;/p>
&lt;p>$$
\mu_2=\mu_1+v_1=\mu_0+v_0+v_0
$$&lt;/p>
&lt;p>$$
v_2=v_1=v_0
$$&lt;/p>
&lt;p>$$
y_2=\mu_0+2v_0
$$&lt;/p>
&lt;p>At $t=3$:&lt;/p>
&lt;p>$$y_3=\mu_2
$$&lt;/p>
&lt;p>$$
\mu_3=\mu_2+v_2=\mu_0+v_0+v_0+b_0
$$&lt;/p>
&lt;p>$$
v_3=v_2=v_1=v_0
$$&lt;/p>
&lt;p>$$
y_3=\mu_0+3v_0
$$&lt;/p>
&lt;p>Therefore, in this case the linear trend model simplifies to&lt;/p>
&lt;p>$$
y_t=\mu_0+v_0g_t+\epsilon_t
$$&lt;/p>
&lt;p>where $g_t=t$ for $t=1,&amp;hellip;,n$ is effectively time and $\mu_0$ and $v_0$ are the initial values of the level and the slope.&lt;/p>
&lt;p>The state space model above can be expressed algebraically in one unified formulation. Using matrix algebra, these models can be written in the following general format:&lt;/p>
&lt;p>$$
y_t=Z_{t}^{T}\alpha_t+\epsilon_t
$$&lt;/p>
&lt;p>$$
\alpha_{t+1}=T_t \alpha_t + R_t \eta_t
$$&lt;/p>
&lt;p>The first equation is the &lt;em>observation&lt;/em> or &lt;em>measurement&lt;/em> equation because it links the observed data with the unobserved latent state $\alpha$. The second equation is the &lt;em>transition&lt;/em> or &lt;em>state&lt;/em> equation because it defines how the latent state evolves over time. $\alpha$ is the &lt;em>state vector&lt;/em>, $Z_t$ is the &lt;em>observation or design vector&lt;/em>, $T_t$ is the &lt;em>transition matrix&lt;/em>, $R_t$ is usually an identity matrix and in cases where it is not identity $R_t$ is called the &lt;em>selection matrix&lt;/em>. Finally, $\eta$ is &lt;em>state disturbances&lt;/em>.&lt;/p>
&lt;p>We can express the &lt;em>local linear trend&lt;/em> model in state space form:&lt;/p>
&lt;p>$$
\alpha_t=\begin{pmatrix}\mu_t\v_t\end{pmatrix}, \quad
\eta_t=\begin{pmatrix}\psi_t\\zeta_t\end{pmatrix}, \quad
T_t=\begin{bmatrix}1 &amp;amp; 1\0 &amp;amp; 1\end{bmatrix}, \quad
Z_t=\begin{pmatrix}1\0\end{pmatrix}
$$&lt;/p>
&lt;p>$$
Q_t=\begin{bmatrix}\sigma_{\mu}^2 &amp;amp; 0\0 &amp;amp; \sigma_{v}^2\end{bmatrix}, \quad
R_t=\begin{bmatrix}1 &amp;amp; 0\0 &amp;amp; 1\end{bmatrix}
$$&lt;/p>
&lt;p>The primary tool for fitting state space model to data is the &lt;em>Kalman filter&lt;/em>, which recursively computes the predictive distribution $p(\alpha_{t+1}\mid y_{1:t}))$ by combining $p(\alpha_{t}\mid y_{1:t-1}))$ with $y_t$ using a standard set of formulas that is logically equivalent to linear regression.&lt;/p>
&lt;p>&lt;em>Intervention variables&lt;/em> can be added to assess the influence of an external change or stimulus to the development in a time series. Three possible interventions are the &lt;em>level shift&lt;/em>, &lt;em>slope shift&lt;/em>, and a &lt;em>pulse&lt;/em> where the value suddenly changes at the moment of the interventiona and than immediately returns to the value before the intervention took place. Changes in the value of level shift and slope shift are permanent after the intervention. A level shift can be expressed as follows:&lt;/p>
&lt;p>$$
y_t=\mu_t+\lambda_t w_t+\epsilon_t
$$&lt;/p>
&lt;p>$$
\mu_{t+1}=\mu_t+W_{1t}
$$&lt;/p>
&lt;p>$$
\lambda_{t+1}=\lambda_t+W_{3t}
$$&lt;/p>
&lt;p>The dummy variable $w_t$ equals zero at all time points before the intervention and equals unity at the time points after the intervention.&lt;/p>
&lt;p>The state space equations can also be cast as a probabilistic model such that for the measurement model we have $y_t\sim p(y_t \mid \alpha_t)$ and for the latent state model we have $\alpha_t\sim p(\alpha_t \mid \alpha_{t-1})$.&lt;/p>
&lt;p>Resources:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="http://www.ssfpack.com/CKbook.html">An Introduction to State Space Time Series Analysis by Commandeur and Koopman&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Pandoc static site generator</title><link>https://asifr.com/pandoc-static-site-generator/</link><pubDate>Wed, 20 May 2015 00:00:00 +0000</pubDate><guid>https://asifr.com/pandoc-static-site-generator/</guid><description>
&lt;p>Pandoc is one of the most useful command-line document converters I&amp;rsquo;ve used. Entire websites can be generated from simple text files. I&amp;rsquo;m partial to the Markdown syntax, but really any plain-text file can be read into Pandoc and it will output HTML, PDF, and even Word Docs. This website for instance is built using a single &lt;code>Makefile&lt;/code> that runs a Pandoc command.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>pandoc -r markdown+simple_tables+table_captions+yaml_metadata_block+auto_identifiers
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> +header_attributes+fenced_code_blocks+fenced_code_attributes+tex_math_dollars
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> -w html --toc --mathjax --include-before-body=&lt;span style="color:#f00">$(&lt;/span>PANDOC&lt;span style="color:#f00">)&lt;/span>/navigation.html
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> --template=&lt;span style="color:#f00">$(&lt;/span>PANDOC&lt;span style="color:#f00">)&lt;/span>/layout.html --css=&lt;span style="color:#f00">$(&lt;/span>PANDOC&lt;span style="color:#f00">)&lt;/span>/style.css -o &lt;span style="color:#f00">$(&lt;/span>ROOT&lt;span style="color:#f00">)&lt;/span>/&lt;span style="color:#eedd82">$@&lt;/span> $&amp;lt;
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Basically, for every Markdown &lt;code>.md&lt;/code> file in the directory, Pandoc converts it to an HTML file styled using a given template. The Markdown file supports YAML headers and the variables are made available in the template file. Templates also support basic logic (&lt;code>if/else&lt;/code>) and loops (&lt;code>for&lt;/code>), which allows for some very smart template files that can change the HTML output based on the YAML headers in your Markdown file.&lt;/p>
&lt;p>Pandoc also support syntax highlighting for embedded code.&lt;/p></description></item><item><title>Data science at the command line</title><link>https://asifr.com/data-science-at-the-command-line/</link><pubDate>Thu, 25 Dec 2014 00:00:00 +0000</pubDate><guid>https://asifr.com/data-science-at-the-command-line/</guid><description>
&lt;p>The UNIX command line is a powerful tool for diving into large data files and piping specialized utilities to create summary statistics. This is a compilation of some useful commands.&lt;/p>
&lt;h2 id="parsing-csv">Parsing CSV&lt;/h2>
&lt;p>Read the first line of a CSV file, which contains the column names, and list each column name in a new line after splitting by comma and stripping surrounding quotes. Notice &lt;code>awk&lt;/code> can create an array and has operations like &lt;code>length&lt;/code> that returns the number of elements in an array.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>head -n &lt;span style="color:#f60">1&lt;/span> WebExtract.txt | awk &lt;span style="color:#87ceeb">&amp;#39;{ split($0, a, &amp;#34;,&amp;#34;); max = length(a) }
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> END { for (x=1; x&amp;lt;=max;x++) {gsub(/&amp;#34;/, &amp;#34;&amp;#34;, a[x]); print a[x]} }&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Take the average of a column of integers in a CSV file after filtering for a category.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>cat iris.csv | grep &lt;span style="color:#87ceeb">&amp;#39;Iris-setosa&amp;#39;&lt;/span> | awk -F &lt;span style="color:#87ceeb">&amp;#34;,&amp;#34;&lt;/span> &lt;span style="color:#87ceeb">&amp;#39;{ sum += $1; n += 1; }
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> END { printf &amp;#34;%0.5f\n&amp;#34;, sum/n }&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>To select a single line in a CSV file and output one column.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>sed -n &lt;span style="color:#87ceeb">&amp;#39;2405p;2405q&amp;#39;&lt;/span> WebExtract.txt | awk -F &lt;span style="color:#87ceeb">&amp;#34;\&amp;#34;,\&amp;#34;&amp;#34;&lt;/span> &lt;span style="color:#87ceeb">&amp;#39;{ print $3; }&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>We can also select a subset of lines in the middle of the document and return a single column.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>sed -n &lt;span style="color:#87ceeb">&amp;#39;1000,1010p&amp;#39;&lt;/span> WebExtract.txt | awk -F &lt;span style="color:#87ceeb">&amp;#34;\&amp;#34;,\&amp;#34;&amp;#34;&lt;/span> { &lt;span style="color:#87ceeb">&amp;#39;print $3; }&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Count the number of lines with &lt;code>wc&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>cat trip_data_1.csv | wc -l
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>JSON files can be parsed using jq. The following command parses a 35MB file for the attribute &lt;code>city&lt;/code>, sorts the cities, and returns the number of occurrences.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>cat ./yelp_train_academic_dataset_business.json | jq &lt;span style="color:#87ceeb">&amp;#39;.city&amp;#39;&lt;/span> | sort | uniq -c
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="parsing-json">Parsing JSON&lt;/h2>
&lt;p>To test a command on a large file we can select just the first few lines.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>head -n &lt;span style="color:#f60">100&lt;/span> ./yelp_train_academic_dataset_business.json | jq &lt;span style="color:#87ceeb">&amp;#39;.city&amp;#39;&lt;/span> | sort | uniq -c
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>To further filter the output for occurrences greater than 5 we can use &lt;code>awk&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>head -n &lt;span style="color:#f60">100&lt;/span> ./yelp_train_academic_dataset_business.json | jq &lt;span style="color:#87ceeb">&amp;#39;.city&amp;#39;&lt;/span> |
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sort | uniq -c | awk &lt;span style="color:#87ceeb">&amp;#39;{if ($1 &amp;gt; 5) {print $0}}&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>The above filter also works reasonably well on the entire dataset.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>cat ./yelp_train_academic_dataset_business.json | jq &lt;span style="color:#87ceeb">&amp;#39;.city&amp;#39;&lt;/span> |
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sort | uniq -c | awk &lt;span style="color:#87ceeb">&amp;#39;{if ($1 &amp;gt; 100) {print $0}}&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>To get the total number of entires with over 100 occurrences we can take the sum.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>cat ./yelp_train_academic_dataset_business.json | jq &lt;span style="color:#87ceeb">&amp;#39;.city&amp;#39;&lt;/span> |
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sort | uniq -c | awk &lt;span style="color:#87ceeb">&amp;#39;{ if ($1 &amp;gt; 100) {sum += $1;} } END { print sum }&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>To filter by a key within &lt;code>jq&lt;/code> we can pipe commands and return only the city names.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>head -n &lt;span style="color:#f60">3&lt;/span> ./yelp_train_academic_dataset_business.json | jq &lt;span style="color:#87ceeb">&amp;#39;select(.city |
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#87ceeb"> contains(&amp;#34;De Forest&amp;#34;)) | .city&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div></description></item><item><title>Punchcard visualization using D3.js</title><link>https://asifr.com/punchcard-visualization-using-d3js/</link><pubDate>Tue, 12 Aug 2014 00:00:00 +0000</pubDate><guid>https://asifr.com/punchcard-visualization-using-d3js/</guid><description>
&lt;p>Consider using a punchcard visualization when you want to represent both counts and proportionality over a time period across categorical data.&lt;/p>
&lt;p>&lt;img src="https://asifr.com/images/punchard-vis.webp" alt="Punchcard vis">&lt;/p>
&lt;p>The graphic is particularly useful in interactive documents with hover effects showing the underlying count.&lt;/p>
&lt;script src="https://gist.github.com/asifr/b80f6be64cf516f4a457.js">&lt;/script></description></item><item><title>Data Mining PubMed</title><link>https://asifr.com/data-mining-pubmed/</link><pubDate>Wed, 12 Mar 2014 00:00:00 +0000</pubDate><guid>https://asifr.com/data-mining-pubmed/</guid><description>
&lt;p>The National Institutes of Health provides a full programming interface to search PubMed called &lt;a href="http://www.ncbi.nlm.nih.gov/books/NBK25499/">E-Utilities&lt;/a>. Interacting with the PubMed database is conveniently through simple HTTP requests and returns the article metadata as XML. Every article in PubMed has a title, author, abstract, journal, year, volume, issue, pages, and keywords, amoung other metadata. Getting the metadata from PubMed, however, involves two separate queries. Very simply, the first query returns a list of PubMed IDs for articles matching the search criteria and the second query returns article data for a given PMID.&lt;/p>
&lt;p>The workflow is divided into two parts:&lt;/p>
&lt;p>Query E-Search passing it your search term and it returns a list of PMIDs that are used to query E-Fetch for the article metadata.&lt;/p>
&lt;pre>&lt;code>http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?term=electrical+stimulation
&amp;amp;amp;retmax=10&amp;amp;amp;tool=pmquery&amp;amp;amp;db=pubmed
&lt;/code>&lt;/pre>
&lt;p>E-Search returns a list of PMIDs:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#000;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-xml" data-lang="xml">&lt;span style="display:flex;">&lt;span> &amp;lt;eSearchResult&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;Count&amp;gt;157380&amp;lt;/Count&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;RetMax&amp;gt;10&amp;lt;/RetMax&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;RetStart&amp;gt;0&amp;lt;/RetStart&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;IdList&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;Id&amp;gt;23858010&amp;lt;/Id&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;Id&amp;gt;23856563&amp;lt;/Id&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;Id&amp;gt;23856146&amp;lt;/Id&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;Id&amp;gt;23855510&amp;lt;/Id&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;Id&amp;gt;23839460&amp;lt;/Id&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;Id&amp;gt;23839375&amp;lt;/Id&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;Id&amp;gt;23853340&amp;lt;/Id&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;Id&amp;gt;23853339&amp;lt;/Id&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;Id&amp;gt;23853324&amp;lt;/Id&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;Id&amp;gt;23853296&amp;lt;/Id&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &amp;lt;/IdList&amp;gt;
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> ...
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Next, query E-Fetch for the article data. You can request multiple PMIDs at once and even the return type (XML, text, JSON). The API also supports pagination to iteratively get many thousands of results.&lt;/p>
&lt;pre>&lt;code>http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi
?db=pubmed&amp;amp;id=23856563,23858010&amp;amp;retmode=xml
&lt;/code>&lt;/pre>
&lt;p>I&amp;rsquo;ve written several interfaces to access the PubMed API, including in PHP, Python, and C#. For instance, the Python script was written specifically to data-mine PubMed. Given a search term &lt;a href="https://github.com/asifr/pmquery">&lt;code>pmquery.py&lt;/code>&lt;/a> will query PubMed and save each article to a text file. For some search terms, like &amp;ldquo;transcranial magnetic stimulation&amp;rdquo; this results in over 9000 articles returned by Pubmed. So the process is iterative and can take some time (minutes). The &lt;a href="https://github.com/asifr/PHP-PubMed-API-Wrapper">PHP implementation&lt;/a> provides a web-based search interface. For desktop based applications, see the &lt;a href="https://github.com/asifr/Scholared-app/blob/master/Scholared/MainWindowController.cs">C# code&lt;/a> and the &lt;a href="https://github.com/asifr/Scholared-app">Scholared app&lt;/a> for a working example.&lt;/p></description></item></channel></rss>