1 /*!
2 The `csv` crate provides a fast and flexible CSV reader and writer, with
3 support for Serde.
4 
5 The [tutorial](tutorial/index.html) is a good place to start if you're new to
6 Rust.
7 
8 The [cookbook](cookbook/index.html) will give you a variety of complete Rust
9 programs that do CSV reading and writing.
10 
11 # Brief overview
12 
13 **If you're new to Rust**, you might find the
14 [tutorial](tutorial/index.html)
15 to be a good place to start.
16 
17 The primary types in this crate are
18 [`Reader`](struct.Reader.html)
19 and
20 [`Writer`](struct.Writer.html),
21 for reading and writing CSV data respectively.
22 Correspondingly, to support CSV data with custom field or record delimiters
23 (among many other things), you should use either a
24 [`ReaderBuilder`](struct.ReaderBuilder.html)
25 or a
26 [`WriterBuilder`](struct.WriterBuilder.html),
27 depending on whether you're reading or writing CSV data.
28 
29 Unless you're using Serde, the standard CSV record types are
30 [`StringRecord`](struct.StringRecord.html)
31 and
32 [`ByteRecord`](struct.ByteRecord.html).
33 `StringRecord` should be used when you know your data to be valid UTF-8.
34 For data that may be invalid UTF-8, `ByteRecord` is suitable.
35 
36 Finally, the set of errors is described by the
37 [`Error`](struct.Error.html)
38 type.
39 
40 The rest of the types in this crate mostly correspond to more detailed errors,
41 position information, configuration knobs or iterator types.
42 
43 # Setup
44 
45 Run `cargo add csv` to add the latest version of the `csv` crate to your
46 Cargo.toml.
47 
48 If you want to use Serde's custom derive functionality on your custom structs,
49 then run `cargo add serde --features derive` to add the `serde` crate with its
50 `derive` feature enabled to your `Cargo.toml`.
51 
52 # Example
53 
54 This example shows how to read CSV data from stdin and print each record to
55 stdout.
56 
57 There are more examples in the [cookbook](cookbook/index.html).
58 
59 ```no_run
60 use std::{error::Error, io, process};
61 
62 fn example() -> Result<(), Box<dyn Error>> {
63     // Build the CSV reader and iterate over each record.
64     let mut rdr = csv::Reader::from_reader(io::stdin());
65     for result in rdr.records() {
66         // The iterator yields Result<StringRecord, Error>, so we check the
67         // error here.
68         let record = result?;
69         println!("{:?}", record);
70     }
71     Ok(())
72 }
73 
74 fn main() {
75     if let Err(err) = example() {
76         println!("error running example: {}", err);
77         process::exit(1);
78     }
79 }
80 ```
81 
82 The above example can be run like so:
83 
84 ```ignore
85 $ git clone git://github.com/BurntSushi/rust-csv
86 $ cd rust-csv
87 $ cargo run --example cookbook-read-basic < examples/data/smallpop.csv
88 ```
89 
90 # Example with Serde
91 
92 This example shows how to read CSV data from stdin into your own custom struct.
93 By default, the member names of the struct are matched with the values in the
94 header record of your CSV data.
95 
96 ```no_run
97 use std::{error::Error, io, process};
98 
99 #[derive(Debug, serde::Deserialize)]
100 struct Record {
101     city: String,
102     region: String,
103     country: String,
104     population: Option<u64>,
105 }
106 
107 fn example() -> Result<(), Box<dyn Error>> {
108     let mut rdr = csv::Reader::from_reader(io::stdin());
109     for result in rdr.deserialize() {
110         // Notice that we need to provide a type hint for automatic
111         // deserialization.
112         let record: Record = result?;
113         println!("{:?}", record);
114     }
115     Ok(())
116 }
117 
118 fn main() {
119     if let Err(err) = example() {
120         println!("error running example: {}", err);
121         process::exit(1);
122     }
123 }
124 ```
125 
126 The above example can be run like so:
127 
128 ```ignore
129 $ git clone git://github.com/BurntSushi/rust-csv
130 $ cd rust-csv
131 $ cargo run --example cookbook-read-serde < examples/data/smallpop.csv
132 ```
133 
134 */
135 
136 #![deny(missing_docs)]
137 
138 use std::result;
139 
140 use serde::{Deserialize, Deserializer};
141 
142 pub use crate::{
143     byte_record::{ByteRecord, ByteRecordIter, Position},
144     deserializer::{DeserializeError, DeserializeErrorKind},
145     error::{
146         Error, ErrorKind, FromUtf8Error, IntoInnerError, Result, Utf8Error,
147     },
148     reader::{
149         ByteRecordsIntoIter, ByteRecordsIter, DeserializeRecordsIntoIter,
150         DeserializeRecordsIter, Reader, ReaderBuilder, StringRecordsIntoIter,
151         StringRecordsIter,
152     },
153     string_record::{StringRecord, StringRecordIter},
154     writer::{Writer, WriterBuilder},
155 };
156 
157 mod byte_record;
158 pub mod cookbook;
159 mod debug;
160 mod deserializer;
161 mod error;
162 mod reader;
163 mod serializer;
164 mod string_record;
165 pub mod tutorial;
166 mod writer;
167 
168 /// The quoting style to use when writing CSV data.
169 #[derive(Clone, Copy, Debug)]
170 pub enum QuoteStyle {
171     /// This puts quotes around every field. Always.
172     Always,
173     /// This puts quotes around fields only when necessary.
174     ///
175     /// They are necessary when fields contain a quote, delimiter or record
176     /// terminator. Quotes are also necessary when writing an empty record
177     /// (which is indistinguishable from a record with one empty field).
178     ///
179     /// This is the default.
180     Necessary,
181     /// This puts quotes around all fields that are non-numeric. Namely, when
182     /// writing a field that does not parse as a valid float or integer, then
183     /// quotes will be used even if they aren't strictly necessary.
184     NonNumeric,
185     /// This *never* writes quotes, even if it would produce invalid CSV data.
186     Never,
187     /// Hints that destructuring should not be exhaustive.
188     ///
189     /// This enum may grow additional variants, so this makes sure clients
190     /// don't count on exhaustive matching. (Otherwise, adding a new variant
191     /// could break existing code.)
192     #[doc(hidden)]
193     __Nonexhaustive,
194 }
195 
196 impl QuoteStyle {
to_core(self) -> csv_core::QuoteStyle197     fn to_core(self) -> csv_core::QuoteStyle {
198         match self {
199             QuoteStyle::Always => csv_core::QuoteStyle::Always,
200             QuoteStyle::Necessary => csv_core::QuoteStyle::Necessary,
201             QuoteStyle::NonNumeric => csv_core::QuoteStyle::NonNumeric,
202             QuoteStyle::Never => csv_core::QuoteStyle::Never,
203             _ => unreachable!(),
204         }
205     }
206 }
207 
208 impl Default for QuoteStyle {
default() -> QuoteStyle209     fn default() -> QuoteStyle {
210         QuoteStyle::Necessary
211     }
212 }
213 
214 /// A record terminator.
215 ///
216 /// Use this to specify the record terminator while parsing CSV. The default is
217 /// CRLF, which treats `\r`, `\n` or `\r\n` as a single record terminator.
218 #[derive(Clone, Copy, Debug)]
219 pub enum Terminator {
220     /// Parses `\r`, `\n` or `\r\n` as a single record terminator.
221     CRLF,
222     /// Parses the byte given as a record terminator.
223     Any(u8),
224     /// Hints that destructuring should not be exhaustive.
225     ///
226     /// This enum may grow additional variants, so this makes sure clients
227     /// don't count on exhaustive matching. (Otherwise, adding a new variant
228     /// could break existing code.)
229     #[doc(hidden)]
230     __Nonexhaustive,
231 }
232 
233 impl Terminator {
234     /// Convert this to the csv_core type of the same name.
to_core(self) -> csv_core::Terminator235     fn to_core(self) -> csv_core::Terminator {
236         match self {
237             Terminator::CRLF => csv_core::Terminator::CRLF,
238             Terminator::Any(b) => csv_core::Terminator::Any(b),
239             _ => unreachable!(),
240         }
241     }
242 }
243 
244 impl Default for Terminator {
default() -> Terminator245     fn default() -> Terminator {
246         Terminator::CRLF
247     }
248 }
249 
250 /// The whitespace preservation behaviour when reading CSV data.
251 #[derive(Clone, Copy, Debug, PartialEq)]
252 pub enum Trim {
253     /// Preserves fields and headers. This is the default.
254     None,
255     /// Trim whitespace from headers.
256     Headers,
257     /// Trim whitespace from fields, but not headers.
258     Fields,
259     /// Trim whitespace from fields and headers.
260     All,
261     /// Hints that destructuring should not be exhaustive.
262     ///
263     /// This enum may grow additional variants, so this makes sure clients
264     /// don't count on exhaustive matching. (Otherwise, adding a new variant
265     /// could break existing code.)
266     #[doc(hidden)]
267     __Nonexhaustive,
268 }
269 
270 impl Trim {
should_trim_fields(&self) -> bool271     fn should_trim_fields(&self) -> bool {
272         self == &Trim::Fields || self == &Trim::All
273     }
274 
should_trim_headers(&self) -> bool275     fn should_trim_headers(&self) -> bool {
276         self == &Trim::Headers || self == &Trim::All
277     }
278 }
279 
280 impl Default for Trim {
default() -> Trim281     fn default() -> Trim {
282         Trim::None
283     }
284 }
285 
286 /// A custom Serde deserializer for possibly invalid `Option<T>` fields.
287 ///
288 /// When deserializing CSV data, it is sometimes desirable to simply ignore
289 /// fields with invalid data. For example, there might be a field that is
290 /// usually a number, but will occasionally contain garbage data that causes
291 /// number parsing to fail.
292 ///
293 /// You might be inclined to use, say, `Option<i32>` for fields such at this.
294 /// By default, however, `Option<i32>` will either capture *empty* fields with
295 /// `None` or valid numeric fields with `Some(the_number)`. If the field is
296 /// non-empty and not a valid number, then deserialization will return an error
297 /// instead of using `None`.
298 ///
299 /// This function allows you to override this default behavior. Namely, if
300 /// `Option<T>` is deserialized with non-empty but invalid data, then the value
301 /// will be `None` and the error will be ignored.
302 ///
303 /// # Example
304 ///
305 /// This example shows how to parse CSV records with numerical data, even if
306 /// some numerical data is absent or invalid. Without the
307 /// `serde(deserialize_with = "...")` annotations, this example would return
308 /// an error.
309 ///
310 /// ```
311 /// use std::error::Error;
312 ///
313 /// #[derive(Debug, serde::Deserialize, Eq, PartialEq)]
314 /// struct Row {
315 ///     #[serde(deserialize_with = "csv::invalid_option")]
316 ///     a: Option<i32>,
317 ///     #[serde(deserialize_with = "csv::invalid_option")]
318 ///     b: Option<i32>,
319 ///     #[serde(deserialize_with = "csv::invalid_option")]
320 ///     c: Option<i32>,
321 /// }
322 ///
323 /// # fn main() { example().unwrap(); }
324 /// fn example() -> Result<(), Box<dyn Error>> {
325 ///     let data = "\
326 /// a,b,c
327 /// 5,\"\",xyz
328 /// ";
329 ///     let mut rdr = csv::Reader::from_reader(data.as_bytes());
330 ///     if let Some(result) = rdr.deserialize().next() {
331 ///         let record: Row = result?;
332 ///         assert_eq!(record, Row { a: Some(5), b: None, c: None });
333 ///         Ok(())
334 ///     } else {
335 ///         Err(From::from("expected at least one record but got none"))
336 ///     }
337 /// }
338 /// ```
invalid_option<'de, D, T>(de: D) -> result::Result<Option<T>, D::Error> where D: Deserializer<'de>, Option<T>: Deserialize<'de>,339 pub fn invalid_option<'de, D, T>(de: D) -> result::Result<Option<T>, D::Error>
340 where
341     D: Deserializer<'de>,
342     Option<T>: Deserialize<'de>,
343 {
344     Option::<T>::deserialize(de).or_else(|_| Ok(None))
345 }
346