1
00:00:00,460 --> 00:00:01,160
Welcome back.

2
00:00:01,180 --> 00:00:06,100
And this video, I will show you the fundamentals of text to speech and how to implement it with Kotlin

3
00:00:06,400 --> 00:00:12,580
for your Android apps, and what you can see here is that we have this little screen with a little text

4
00:00:12,580 --> 00:00:16,149
here and an added text in which we can enter a text.

5
00:00:16,149 --> 00:00:20,980
And once we click on the button below, you can see that text speech is executed.

6
00:00:21,500 --> 00:00:26,110
OK, so text speech is pretty good in English and this will, of course, only work if text to speech

7
00:00:26,110 --> 00:00:28,340
is installed on your phone.

8
00:00:28,570 --> 00:00:29,960
And let's check it up.

9
00:00:30,700 --> 00:00:33,240
What we need is, of course, a little UI to test it.

10
00:00:33,250 --> 00:00:38,710
So I have to text you then have an edited text in which I can enter a text and then I have this button.

11
00:00:39,410 --> 00:00:39,710
Right.

12
00:00:39,760 --> 00:00:44,750
And the idea is that whatever is entered in the edit text will be spoken once we press the button.

13
00:00:45,340 --> 00:00:48,340
Now going to our main activity where all the magic happens.

14
00:00:49,000 --> 00:00:53,890
And if you want to use text to speech, what is important is that we extend our main activity with the

15
00:00:53,890 --> 00:00:57,900
text to speech on in a listener doing so, as you can see.

16
00:00:58,090 --> 00:01:01,180
You can do that by adding a comma and then.

17
00:01:02,170 --> 00:01:09,070
The listener that you want to add, so now our whole main activity will also follow the structure of

18
00:01:09,380 --> 00:01:15,990
that listener and it will need to implement the on in init function.

19
00:01:16,390 --> 00:01:18,270
OK, so it won't work without this on.

20
00:01:18,280 --> 00:01:21,630
And you can very easily test that once you get rid of it.

21
00:01:21,910 --> 00:01:28,660
You can see that the main activity wants to add the members so wants us to implement the members, for

22
00:01:28,660 --> 00:01:29,080
example.

23
00:01:29,080 --> 00:01:30,140
They're on in it.

24
00:01:30,550 --> 00:01:31,070
OK.

25
00:01:31,480 --> 00:01:34,870
And of course I'm going to bring it back here because we're going to need it.

26
00:01:35,410 --> 00:01:35,710
All right.

27
00:01:35,710 --> 00:01:40,980
Now, the first thing we need to do is to create a variable, which is our text to speech variable.

28
00:01:40,990 --> 00:01:44,770
And I'm going to call that one Tietz standing for text to speech.

29
00:01:44,770 --> 00:01:45,250
It's another.

30
00:01:45,580 --> 00:01:49,450
And I'm going to set that now because I'll initialize it in our uncreate method.

31
00:01:49,900 --> 00:01:57,880
You can see that our text to speech variable here will be assigned a text to speech object and which

32
00:01:57,880 --> 00:02:03,970
we need to parse the context and to listen so you can say this involved because the context is the main

33
00:02:03,970 --> 00:02:04,380
activity.

34
00:02:04,390 --> 00:02:08,139
So it should be spoken in the main activity.

35
00:02:08,350 --> 00:02:11,250
And at the same time, the main activity is also the listener.

36
00:02:11,440 --> 00:02:17,290
And this year is only possible because it is actually an on in IT listener here.

37
00:02:17,710 --> 00:02:21,040
So otherwise our main activity couldn't be the listener, OK?

38
00:02:21,070 --> 00:02:23,620
That's why it's important to understand that.

39
00:02:24,010 --> 00:02:30,220
OK, then we have this button, a speaker button, and once we click on it, I want to check if there

40
00:02:30,220 --> 00:02:33,970
is a text in the edit text.

41
00:02:34,150 --> 00:02:39,820
So if there's no added text, if it's empty, then I want to show a toast, which is just going to say

42
00:02:39,820 --> 00:02:48,590
enter a text to use Tietz and otherwise call this Speakout method in which you pass in a text.

43
00:02:48,640 --> 00:02:56,350
So what I'm doing is I'm getting the added text text and making a string out of it and passing it to

44
00:02:56,350 --> 00:02:57,160
the Speakout method.

45
00:02:57,460 --> 00:02:59,020
Now let's look at the Speakout method.

46
00:03:00,560 --> 00:03:08,810
The speak out method, what it does is it calls the speak method of tweets so so the text to speech

47
00:03:09,380 --> 00:03:12,910
class, it just goes to speak method, the speak method.

48
00:03:13,190 --> 00:03:18,470
It needs to text that it should speak then the Q type.

49
00:03:18,470 --> 00:03:19,580
So text to speech.

50
00:03:19,580 --> 00:03:21,230
Q Flush is what I'm using.

51
00:03:21,240 --> 00:03:28,640
So this will just stop the other element that was already in the queue and will start speaking.

52
00:03:29,060 --> 00:03:30,290
We can test that real quickly.

53
00:03:31,820 --> 00:03:38,930
This is as if I think it twice and I want to see the old entry is getting deleted and all of the new

54
00:03:38,930 --> 00:03:43,400
entries there so it's not queued up, but it's actually flushed.

55
00:03:43,410 --> 00:03:45,080
And this is what I'm using with Cue Flush.

56
00:03:45,110 --> 00:03:47,800
You can use different types here as well.

57
00:03:47,810 --> 00:03:52,570
You can add kieu or use to cue add, which will just add things up.

58
00:03:52,580 --> 00:03:58,910
So if you have multiple takes, the speeches after another at will call them in a row.

59
00:03:58,970 --> 00:04:00,040
So let's check that out.

60
00:04:01,580 --> 00:04:05,690
OK, so I did the test and I want to see if it work.

61
00:04:05,710 --> 00:04:09,890
This is a test and I want to see if you see it multiple times after another.

62
00:04:09,890 --> 00:04:10,820
And it's constantly.

63
00:04:11,970 --> 00:04:18,870
Spoke to text, so that's what happens when you use cue, add and cue flush, of course, gets rid of

64
00:04:18,870 --> 00:04:23,200
the old entry and only speaks to most currently called one.

65
00:04:23,820 --> 00:04:24,210
All right.

66
00:04:24,210 --> 00:04:27,030
If you want to know more about the speak method, you can check it out here.

67
00:04:27,360 --> 00:04:28,650
It's rather interesting.

68
00:04:28,660 --> 00:04:33,540
So it's a speech to text using the specified queuing strategy and speech parameters.

69
00:04:33,750 --> 00:04:35,940
The text may be spent with it spent.

70
00:04:35,970 --> 00:04:39,980
So if you want to have some time in between, this method is asynchronous.

71
00:04:40,410 --> 00:04:47,970
So what happens is that the method just adds the request to the queue of text to speech requests and

72
00:04:47,970 --> 00:04:48,770
then returns.

73
00:04:48,780 --> 00:04:54,480
The synthesis might not have finished or even started at the time when this method returns.

74
00:04:54,780 --> 00:04:59,760
In order to reliably detect errors during synthesis, we recommend setting utterance progress listeners.

75
00:04:59,970 --> 00:05:02,220
So this is going to be a little bit too advanced.

76
00:05:02,220 --> 00:05:03,710
We're not going to require that.

77
00:05:03,900 --> 00:05:08,790
But if you really want to dig deeper into it, then that's definitely something that you should look

78
00:05:08,790 --> 00:05:13,590
into on how to use this on utterance progress listener.

79
00:05:13,890 --> 00:05:18,980
And what's really interesting for us is actually the text, the Q Mode.

80
00:05:19,380 --> 00:05:24,090
Well, you can add some parameters for the request, but it can be null as well.

81
00:05:24,960 --> 00:05:30,150
And the utterance ID if you want to have a unique identifier for this particular request and you can

82
00:05:30,150 --> 00:05:31,980
pass that in as well.

83
00:05:32,970 --> 00:05:33,480
So.

84
00:05:34,400 --> 00:05:40,430
Now, that's our speak out method and let's go to the on inhered method, because that's the one that

85
00:05:40,430 --> 00:05:41,430
is required.

86
00:05:41,450 --> 00:05:42,190
It's really important.

87
00:05:42,200 --> 00:05:43,510
We need to override that.

88
00:05:43,880 --> 00:05:51,260
And what happens here is that we get a status because we know on in it, if there is an on, then this

89
00:05:51,260 --> 00:05:54,120
is something that will be called automatically and not by us.

90
00:05:54,500 --> 00:06:01,730
So we get a status which is of type integer so we can check if the status is equal to text to speech

91
00:06:01,730 --> 00:06:02,230
success.

92
00:06:02,510 --> 00:06:06,910
So if text to speech can be used, then this one will be true.

93
00:06:07,070 --> 00:06:10,500
So the statement here and we can go ahead and set the language.

94
00:06:10,550 --> 00:06:12,530
So here I'm setting it to English.

95
00:06:12,950 --> 00:06:16,400
So I stored the result of whatever.

96
00:06:17,310 --> 00:06:22,920
Is the result of this setting, so something could go wrong here, studying the language could go wrong,

97
00:06:22,920 --> 00:06:25,740
for example, they could not be installed or it could be missing.

98
00:06:25,990 --> 00:06:33,270
So we're checking next is if the result is actually language data missing or if the result is text to

99
00:06:33,270 --> 00:06:35,680
speech, language not supported.

100
00:06:36,180 --> 00:06:41,160
So, for example, if the language is not supported by the phone or if it's just not installed, then

101
00:06:41,670 --> 00:06:42,580
this will not work.

102
00:06:42,690 --> 00:06:48,690
So then we, in our case, just log something so you could, of course, inform the user rather than

103
00:06:48,760 --> 00:06:53,450
tell them to install text to speech once he presses on the text of speech button.

104
00:06:53,850 --> 00:06:55,770
But we're going to keep it simple here.

105
00:06:55,770 --> 00:06:57,390
It's really just a very basic demo.

106
00:06:57,690 --> 00:07:01,890
So you can see I'm just adding this log entry here, saying the language specified is not supported.

107
00:07:02,160 --> 00:07:08,060
And then we have this El's statement here as well, which is for this status comparison.

108
00:07:08,070 --> 00:07:13,500
So if text to speech was unsuccessful from the start, then you can already see, OK, initialization

109
00:07:13,500 --> 00:07:13,880
failed.

110
00:07:13,890 --> 00:07:20,010
So something else might have come up or something else is an issue or creates an issue.

111
00:07:20,700 --> 00:07:28,410
OK, so let's on in it now let's look at on destroy on this Driscol just before our application will

112
00:07:28,410 --> 00:07:31,690
shut down, we're just before this activity will shut down.

113
00:07:32,040 --> 00:07:34,830
And what I want to do is I want to.

114
00:07:35,820 --> 00:07:42,900
See if the Texas peach object is empty or not, if it's not empty, then stop it and then shut it down.

115
00:07:43,840 --> 00:07:49,930
And then call super undestroyed, what we do is we make sure that our program closes correctly and shuts

116
00:07:49,930 --> 00:07:54,970
down correctly without leaving any fragments of our text to speech in the background or things like

117
00:07:54,970 --> 00:07:55,210
that.

118
00:07:55,860 --> 00:07:57,640
OK, so that's it.

119
00:07:57,850 --> 00:07:59,730
You can see it's rather basic, actually.

120
00:07:59,740 --> 00:08:03,230
So there's not that much that you need to implement in order to use text to speech.

121
00:08:03,250 --> 00:08:08,130
It's a very powerful tool and it's rather easy to implement and I really like that.

122
00:08:08,140 --> 00:08:14,110
So I it's great if you have something like that where you really need to write very little code and

123
00:08:14,110 --> 00:08:17,980
Android is taking care of everything itself, which is amazing.

124
00:08:18,130 --> 00:08:22,870
And we're going to use text to speech, of course, in our application in our seven minute workout app.

125
00:08:23,060 --> 00:08:26,530
So I hope you enjoy that and see you in the next video.